Watson AI Ops
Role
Lead Designer
Region
Global
Industry
Cloud & AI
Company
IBM
Timeline
1.5 Years
Project Hero Image

Overview

Watson AIOps is an AI-driven platform that transforms IT incident resolution by unifying fragmented tools, reducing noise, and providing actionable insights. I led the design effort to create a scalable, intuitive solution tailored for IT teams like Site Reliability Engineers (SREs), IT Administrators, and Operators.

Challenge

  • User Problem: IT teams are overwhelmed by thousands of daily alerts, struggling to identify and resolve critical issues amidst fragmented tools and overwhelming noise.
  • Product Team Challenge: Integrating IBM’s existing tools and acquisitions into a seamless platform that addresses user needs while aligning with enterprise-level goals

Approach

  • Strategic Leadership: Led a cross-functional team of UX/UI designers and researchers, ensuring alignment with technical and business priorities. Collaborated with other Design Leads, developed roadmaps, planned releases, and used storytelling to align stakeholders.
  • Continuous Research: Conducted continuous interviews with +25 across 9 industries, complemented by site visits and usability tests, uncovering pain points like notification fatigue and workflow inefficiencies.
  • Design Contributions: Unified workflows across fragmented IBM tools and tailored the platform to engineer UI preferences through:
    • The Resolution Hub: A centralized space for prioritizing and managing alerts and incidents, reducing cognitive load and enabling faster decisions.
    • Stories Overview: Consolidated incident details—alerts, history, and tickets—into one interface for clarity and collaboration.
    • Topology Overhaul: Redesigned visualizations with contextual side panels, animations, and modern UI to simplify complex relationships.
    • Alerts and Monitoring: Enhanced filtering and monitoring tools, enabling deeper exploration of critical issues and faster resolutions.

Solution

Watson AIOps unified IBM’s fragmented tools into a cohesive, AI-driven platform. Automated event correlation grouped related alerts into actionable stories, significantly reducing noise. Centralized features like the Resolution Hub, Story Overviews and Topology provided clear workflows and intuitive visualizations, enabling SREs to quickly identify, prioritize, and resolve critical issues.

Outcome

Faster Resolutions: SREs and IT teams can resolve incidents significantly faster by cutting through noise, prioritizing critical alerts, and acting on AI-driven recommendations. For example, one customer reduced resolution times from weeks to just hours, minimizing downtime and its impact on business operations.

Improved Efficiency: Automation of event correlation and streamlined workflows enables IT teams to save valuable time. As an example, a single organization saved over 1,000 hours annually by eliminating manual tasks, allowing their teams to focus on higher-value activities.

Enhanced Confidence: Intuitive, user-centered designs empower IT teams to trust AI-driven insights and make decisions with greater confidence. Features like the Resolution Hub and Topology deliver clarity and actionable insights tailored to engineers’ workflows.

Sustainability Gains: Optimized resource usage reduces energy consumption and carbon emissions in data centers, supporting long-term sustainability goals. For instance, automation improvements across server rooms have led to significant reductions in power usage and cooling requirements.

Industry Recognition: Watson AIOps has won multiple design awards for innovation, including recognition for its ability to simplify IT workflows and deliver measurable ROI.

“ Resolution times are not three weeks, but one hour... We more or less have the global environment under control in one single point of view.”

Joska Lot, Global Solution Service Architect, Electrolux AB

Read more