Skip to main content
Daniel J Glover
Back to Blog

5 IT incidents of 2025: lessons

11 min read

The numbers are staggering. Between January and September 2025, security researchers recorded 4,701 ransomware incidents globally - a significant increase from the 3,219 recorded during the same period in 2024. Cloud providers experienced over 48,000 outages across their services. A single cryptocurrency exchange lost $1.4 billion in a single attack. These IT incident lessons learned from 2025 demand attention from every technology leader planning for 2026.

Understanding what went wrong - and why - is not merely an academic exercise. Each incident reveals systemic vulnerabilities that affect organisations of all sizes. Whether you manage a small team or oversee enterprise infrastructure, the patterns emerging from this year's disruptions should inform your IT management strategy heading into the new year.

As I discussed in my 2026 IT trends predictions, preemptive security is becoming essential. The incidents below illustrate precisely why reactive approaches are no longer sufficient.

1. The CrowdStrike Aftermath: When Security Tools Become Single Points of Failure

Although the CrowdStrike outage occurred in July 2024, its effects rippled throughout 2025, fundamentally changing how organisations approach endpoint security.

What happened: A faulty update to CrowdStrike's Falcon Sensor security software caused approximately 8.5 million Windows systems to crash simultaneously. The root cause was a mismatch between template fields - the update expected 21 input fields while the sensor code only provided 20. A missing runtime bounds check and a logic error in content validation allowed the flawed update through.

Business impact: The worldwide financial damage exceeded $10 billion. Airlines grounded flights, hospitals postponed procedures, banks closed branches, and emergency services experienced disruptions. CrowdStrike's stock dropped 45% over the following 18 days.

Key lessons:

  • Single vendor dependency creates systemic risk. With an 18% global market share, CrowdStrike's failure demonstrated how concentration in security tooling amplifies impact. Organisations should evaluate whether critical security functions depend entirely on one provider.

  • Update deployment models matter. CrowdStrike has since introduced customer-controlled deployment schedules and content pinning. Before this incident, updates deployed simultaneously to all clients. Staggered rollouts should be standard practice.

  • Incident response planning must include security tool failures. Many organisations discovered their incident response plans assumed security infrastructure would be available. The scenario where security tools themselves cause the incident requires separate planning.

  • Recovery time validation is essential. Organisations that had tested recovery procedures fared significantly better. Those that had only documented plans without validation faced extended downtime.

2. UK Retail Sector Attacks: Supply Chain Vulnerabilities Exposed

The spring of 2025 saw coordinated attacks devastate British retail, exposing how interconnected supply chains amplify cybersecurity risk.

What happened: Attackers breached multiple major retailers in quick succession. Marks and Spencer suffered a ransomware attack that M&S chair Archie Norman described as "traumatic." The Co-op lost at least 6.5 million customer records. Disruptions cascaded through logistics systems, leaving grocery shelves empty as backend systems failed. Luxury brands including Adidas, Alexander McQueen, Gucci, and Louis Vuitton also experienced data breaches.

Business impact: M&S estimated the attack would cost the company approximately 300 million GBP in profits. Production operations halted, customer-facing systems went offline, and recovery took weeks rather than days.

Key lessons:

  • Retail's digital transformation has outpaced security investment. The integration of e-commerce platforms, inventory management systems, and customer databases created attack surfaces that legacy security approaches could not adequately protect.

  • Third-party access requires rigorous controls. Several breaches exploited supplier connections. Every external integration point represents a potential entry vector that demands the same scrutiny as internal systems.

  • Data minimisation reduces breach impact. Organisations storing years of customer data suffered greater exposure than those with aggressive data retention policies. Question whether you need to retain historical data, particularly sensitive personal information.

3. European Aviation Ransomware: Critical Infrastructure at Risk

On 19 September 2025, ransomware brought major European airports to a standstill, demonstrating how attacks on shared infrastructure providers can cascade across borders.

What happened: Attackers targeted Collins Aerospace's passenger processing systems (MUSE and vMUSE), which are widely used across European airlines and airports. When these systems failed, check-in, boarding, and flight operations at Heathrow, Brussels, Berlin, and other major hubs experienced significant disruption.

Business impact: Thousands of passengers faced delays and cancellations. Airlines scrambled to implement manual processes. The attack demonstrated that critical aviation infrastructure depends on a small number of vendors whose compromise affects the entire sector.

Key lessons:

  • Shared vendor dependencies in critical infrastructure require sector-wide risk assessment. Individual airlines may have strong security programmes, but their operations depend on third-party systems shared across competitors. Industry-wide resilience planning is essential.

  • Manual fallback procedures must be maintained. Organisations that had preserved and practised manual check-in processes recovered faster than those that had fully deprecated paper-based backup systems.

  • Cross-border incident coordination remains challenging. The attack highlighted gaps in how European aviation authorities share threat intelligence and coordinate response across national boundaries.

4. Cloud Provider Outages: The Concentration Problem

2025 saw multiple significant cloud outages affecting major providers, each reinforcing lessons about infrastructure dependency and resilience planning.

AWS incidents: A race condition in DynamoDB's DNS caused a major us-east-1 outage affecting 141 AWS services. Because many AWS services depend internally on DynamoDB, a single component failure cascaded across the platform. A separate DNS issue knocked out Disney+, Reddit, McDonald's app, and United Airlines.

Microsoft Azure: October brought a significant Azure outage lasting approximately 50 hours, caused by a networking configuration change in East US2. Azure Front Door, Microsoft's CDN and application delivery service, failed for more than eight hours, disrupting Microsoft 365 and numerous enterprise customers. Over 18,000 users reported Azure issues at the peak.

Google Cloud: A null pointer bug caused a seven-hour outage affecting core GCP services, disrupting Spotify, Gmail, and Fitbit. A separate resource contention issue in Google's authentication system prevented users from accessing services for over an hour.

Business impact: Research indicates that average incident duration varies significantly: AWS averaged 1.5 hours, Google Cloud 5.8 hours, and Azure 14.6 hours. AWS controls approximately 32% of the global cloud market, with Azure at 23%, meaning outages affect substantial portions of global digital infrastructure.

Key lessons:

  • Multi-region is not optional for critical workloads. Organisations relying solely on a single availability zone or region experienced the most significant disruptions. True resilience requires active-active or rapid failover across regions.

  • Understand your dependency chains. Many organisations discovered hidden dependencies during outages - services they thought were independent actually shared underlying infrastructure. Map your complete dependency graph.

  • Cloud provider SLAs do not compensate for business impact. Financial credits for downtime rarely approach actual business losses. Design for resilience rather than relying on contractual remedies.

  • Consider multi-cloud strategically, not reflexively. While multi-cloud can reduce provider concentration risk, it also introduces complexity. Evaluate whether the resilience benefits justify the operational overhead for your specific workloads.

5. AI Security Incidents: Emerging Attack Surfaces

As organisations rapidly adopted generative AI tools, 2025 revealed new categories of security vulnerabilities that traditional security frameworks do not address. As I explored in my post on vibe coding security risks, AI-generated code introduces significant vulnerabilities, but the risks extend beyond code generation.

What happened: Researchers discovered seven vulnerabilities in ChatGPT that allowed attackers to steal personal information from user memories and chat histories. Prompt injection attacks could poison user memories by concealing instructions in websites that ChatGPT was asked to summarise. OpenAI acknowledged that prompt injection may be "permanently" unsolvable for AI-powered browsers.

A separate incident saw over 225,000 OpenAI and ChatGPT credentials appear on dark web markets, harvested by infostealer malware. A coordinated campaign compromised over 40 browser extensions used by 3.7 million professionals, many of which were "productivity" tools with AI features.

Business impact: Research indicates that sensitive data now comprises 34.8% of employee ChatGPT inputs, up from 11% in 2023. Yet nearly 47% of organisations have no AI-specific security controls in place. The disconnect between adoption and security governance creates substantial exposure.

Key lessons:

  • AI tools require specific security policies. General data handling policies do not adequately address risks of sharing sensitive information with external AI services. Establish clear guidelines about what data can and cannot be processed through AI tools.

  • Prompt injection is a fundamentally new attack category. Traditional security controls do not detect or prevent prompt injection. Organisations using AI agents or AI-powered browsing must implement specific controls.

  • Shadow AI is the new shadow IT. Employees adopting AI tools without IT oversight mirrors the shadow IT challenges of the previous decade. Visibility into AI tool usage is essential for managing risk.

  • AI security requires ongoing education. The threat landscape is evolving rapidly. Security awareness training must include AI-specific risks and appropriate usage guidelines.

Common Themes Across 2025 Incidents

Analysing these incidents reveals patterns that should inform security and resilience planning:

Concentration risk is systemic. Whether cloud providers, security tools, or shared infrastructure, dependency on a small number of vendors creates cascading failure potential. Diversification has costs, but single points of failure have proven consequences.

Supply chain security requires continuous attention. Attackers increasingly target suppliers, vendors, and shared services rather than primary targets. Security perimeters must extend beyond organisational boundaries.

Speed of recovery matters more than prevention claims. Every organisation will experience incidents. Those that recovered quickly had tested recovery procedures, maintained backup capabilities, and trained staff on manual processes.

Emerging technologies introduce novel risks. AI-related vulnerabilities do not fit traditional security frameworks. Organisations must develop specific controls for new technology categories rather than assuming existing measures suffice.

Practical Takeaways for 2026 Planning

Based on these incidents, consider the following actions for your organisation:

Immediate Priorities

  1. Audit vendor concentration. Identify critical functions that depend entirely on single vendors. Evaluate whether backup options exist or whether redundancy should be developed.

  2. Test recovery procedures. Schedule tabletop exercises and actual recovery tests for your most critical systems. Validate that documented recovery times match reality.

  3. Review AI policies. Ensure your organisation has explicit policies governing AI tool usage, data sharing with AI services, and AI-generated code review requirements.

  4. Map supply chain dependencies. Document third-party integrations and assess their security posture. Consider what happens if any supplier experiences a significant breach.

Strategic Initiatives

  1. Develop incident response playbooks for novel scenarios. Include scenarios where security tools fail, cloud providers experience extended outages, or AI systems are compromised.

  2. Implement staggered update deployment. Ensure critical systems do not receive updates simultaneously. Build deployment schedules that allow validation before broad rollout.

  3. Establish multi-region resilience. For business-critical workloads, ensure operations can continue if a primary region becomes unavailable.

  4. Build security awareness for emerging threats. Update training programmes to include AI-specific risks, supply chain attacks, and social engineering targeting AI systems.

Quick Reference Checklist

Use this checklist to assess your organisation's readiness based on 2025's lessons:

Vendor Risk Management

  • [ ] Critical security tools have backup alternatives identified
  • [ ] Update deployment schedules are staggered, not simultaneous
  • [ ] Vendor security incidents trigger review of your exposure

Supply Chain Security

  • [ ] Third-party integrations are documented and assessed
  • [ ] Supplier security requirements are contractually defined
  • [ ] Access controls limit supplier system permissions

Cloud Resilience

  • [ ] Critical workloads span multiple regions or zones
  • [ ] Dependency mapping includes hidden infrastructure dependencies
  • [ ] Recovery procedures are tested, not just documented

AI Security

  • [ ] Policies govern AI tool usage and data sharing
  • [ ] AI-generated code undergoes security review
  • [ ] Staff are trained on AI-specific security risks

Incident Response

  • [ ] Playbooks cover scenarios where security tools fail
  • [ ] Manual fallback procedures exist for critical processes
  • [ ] Recovery time objectives are validated through testing

Looking Ahead

The incidents of 2025 are not isolated events but signals of structural challenges in how organisations manage technology risk. Concentration in cloud infrastructure, security tooling, and shared services creates systemic vulnerabilities that individual organisations cannot address alone.

However, the lessons are actionable. Organisations that diversify critical dependencies, maintain recovery capabilities, and adapt security frameworks for emerging technologies will prove more resilient when - not if - the next major incident occurs.


Need Help Strengthening Your Incident Response?

Preparing for the challenges ahead requires experienced guidance. My IT management services help organisations assess vendor risks, develop robust incident response plans, and build resilience into critical infrastructure.

Get in touch to discuss how these lessons apply to your organisation and develop a practical plan for 2026.

Share this post

DG

Daniel J Glover

IT Leader with experience spanning IT management, compliance, development, automation, AI, and project management. I write about technology, leadership, and building better systems.

Let's Work Together

Need expert IT consulting? Let's discuss how I can help your organisation.

Get in Touch