Microsoft Azure Intermediate

Azure VM Not Starting: Complete Troubleshooting Guide 2026

Complete Azure VM troubleshooting guide with boot diagnostics, serial console commands, disk repair procedures, and enterprise best practices for 2026.

By Naveed Alam

Mar 16, 2026 8 min read Updated Apr 29, 2026

When an Azure VM refuses to start, every minute counts — production workloads go offline, databases stop responding, and SLA clocks start ticking. After recovering 300+ VM boot failures across healthcare, finance, and enterprise environments, the fastest path to resolution is always the same: start with boot diagnostics, rule out Azure platform issues, then work down to OS-level repairs.

This guide gives you a systematic 8-step approach to diagnose and fix Azure VM startup failures — covering everything from allocation errors and BCD corruption to disk attachment issues and kernel panics. For related connectivity issues once the VM is up, see our guides on Azure VPN Gateway troubleshooting and VPN connectivity problems.

Written by Naveed Alam — Network & Cloud Engineer with 8+ years of hands-on experience in Azure infrastructure, Windows Server, and enterprise virtualization troubleshooting.

Business Impact and Common Scenarios
Azure VM Boot Architecture
Prerequisites
Step-by-Step Troubleshooting Guide
Real-World Case Study
Troubleshooting Common Issues
Best Practices for VM Reliability
Security and Performance Considerations

Business Impact and Common Scenarios

VM startup failures are high-impact events. According to Gartner, unplanned downtime for enterprise applications averages $5,600 per minute in direct revenue loss — and that doesn’t account for SLA penalties or reputational damage.

These failures most commonly occur after: Windows Updates or Linux kernel upgrades, disk expansion operations, VM resize events, Azure maintenance windows, or NSG modifications that inadvertently block boot-time communication.

Azure VM Boot Architecture

Understanding the boot sequence helps you pinpoint exactly where the failure occurs. For full architectural details, refer to the Microsoft Azure Virtual Machines documentation.

Boot Sequence (6 Stages)

Stage 1: Fabric Controller Initialization — Azure allocates compute resources and prepares the virtualization environment.

Stage 2: Virtual Hardware Configuration — Azure configures vCPUs, memory, network interfaces, and attaches virtual disks.

Stage 3: BIOS/UEFI Initialization — Virtual firmware performs POST and locates the boot device.

Stage 4: Boot Loader Execution — Windows Boot Manager or GRUB loads.

Stage 5: OS Kernel Load — The kernel initializes hardware drivers and mounts the root filesystem.

Stage 6: Service Startup — System services, networking components, and applications start.

Key Azure Diagnostic Tools

Boot Diagnostics: Screenshot and serial console log capture — provides visibility into the boot process without requiring network connectivity.

Serial Console: Direct serial port access, bypassing RDP/SSH completely.

Run Command: Executes scripts on a VM even when network connectivity has failed.

Prerequisites

Owner, Contributor, or Virtual Machine Contributor role on the affected VM
Azure CLI 2.50.0 or later
Azure PowerShell Az module 10.0+
VM local administrator credentials
Boot diagnostics storage account access

Step-by-Step Troubleshooting Guide

Step 1: Check VM Power State

Always start here — confirms whether the issue is a platform-level allocation failure or an OS-level boot problem.

# Get VM power state
az vm get-instance-view 
  --name MyVM 
  --resource-group MyResourceGroup 
  --query "instanceView.statuses[?starts_with(code, 'PowerState/')].displayStatus" 
  --output tsv

# Expected: "VM running" or "VM stopped"

Step 2: Review Boot Diagnostics Screenshot

Navigate to: VM → Boot diagnostics → Screenshot. Look for: blue screens (BSOD), “BOOTMGR is missing”, kernel panic messages, or filesystem mount failures. This tells you exactly which boot stage failed.

Step 3: Access Serial Console

Serial console bypasses network connectivity — the most powerful tool when RDP/SSH is unavailable.

# Windows Serial Console commands
sc query           # Check running services
bootrec /fixmbr    # Fix Master Boot Record
bootrec /fixboot   # Fix boot sector
bootrec /rebuildbcd  # Rebuild BCD store

# Linux Serial Console commands
dmesg | less       # View boot messages
journalctl -xb     # Review system logs

Step 4: Check Disk Health

# List disks attached to VM
az vm show 
  --resource-group MyResourceGroup 
  --name MyVM 
  --query "storageProfile.{osDisk:osDisk, dataDisks:dataDisks}" 
  --output json

# Verify OS disk state
az disk show 
  --resource-group MyResourceGroup 
  --name MyVM-osdisk 
  --query "{name:name, state:diskState}" 
  --output table

Step 5: Review Activity and Resource Logs

# Check for AllocationFailed or platform errors
az monitor activity-log list 
  --resource-group MyResourceGroup 
  --offset 24h 
  --query "[?contains(resourceId, 'MyVM')].{Time:eventTimestamp, Status:status, Operation:operationName.localizedValue}" 
  --output table

Step 6: Validate Network Configuration

A VM may boot successfully but appear unreachable due to NSG rules blocking RDP/SSH.

# Check NSG rules
az network nsg show 
  --resource-group MyResourceGroup 
  --name MyVM-nsg 
  --query "securityRules[].{Name:name, Priority:priority, Access:access, DestPort:destinationPortRange}" 
  --output table

Step 7: Use the Azure VM Repair Extension

Automates common repair tasks — the fastest path to resolution for BCD corruption and disk issues.

# Create repair VM automatically
az vm repair create 
  --resource-group MyResourceGroup 
  --name MyVM 
  --repair-username azureuser 
  --verbose

# Run built-in repair scripts
az vm repair run 
  --resource-group MyResourceGroup 
  --name MyVM 
  --run-id win-bootmgr-repair 
  --verbose

Step 8: Restore from Snapshot or Backup

When repairs fail, restoring from a known-good snapshot is the cleanest resolution.

# Create new disk from snapshot
az disk create 
  --resource-group MyResourceGroup 
  --name MyVM-restored-disk 
  --source  
  --sku Premium_LRS

# Swap the OS disk
az vm update 
  --resource-group MyResourceGroup 
  --name MyVM 
  --os-disk MyVM-restored-disk

Real-World Case Study: Healthcare Provider

Situation: A regional healthcare network serving 200,000 patients had its primary SQL Server VM (hosting their EMR system) fail to restart after an Azure platform maintenance event. The VM was stuck in “Starting” state for 45 minutes, cutting off access to patient records.

Root Cause: Activity logs revealed an AllocationFailed error caused by host capacity constraints following the platform update.

Resolution: Immediately failed over to the secondary read-replica, then resolved the original VM by resizing to Standard_E8s_v4 to hit a different allocation pool. Databases were resynchronized with zero data loss.

Total downtime: 47 minutes (within the 1-hour RTO). Preventive measures added: Availability Zones deployment, capacity reservations for critical VMs, automated health checks every 5 minutes.

Troubleshooting Common Issues

Issue 1: BCD Corruption (Windows)

Symptoms: “BOOTMGR is missing”, INACCESSIBLE_BOOT_DEVICE blue screen, or boot stops at Windows Boot Manager.

# On rescue VM with broken disk attached
bootrec /fixmbr
bootrec /fixboot
bootrec /rebuildbcd

# If the above fails, recreate BCD manually
bcdedit /createstore F:BootBCD
bcdboot F:Windows /s F: /f ALL

Issue 2: Kernel Panic (Linux)

Symptoms: “Kernel panic – not syncing” messages, frozen at boot messages, VM never reaches login prompt.

# Reinstall kernel on rescue VM
sudo chroot /mnt/broken
apt-get install --reinstall linux-image-$(uname -r)
update-initramfs -u -k all
update-grub

Issue 3: Azure Allocation Failures

Symptoms: VM stuck in “Starting” state indefinitely, “AllocationFailed” in activity log.

# Resize VM to hit a different allocation pool
az vm deallocate --resource-group MyResourceGroup --name MyVM
az vm resize 
  --resource-group MyResourceGroup 
  --name MyVM 
  --size Standard_D4s_v5
az vm start --resource-group MyResourceGroup --name MyVM

Issue 4: Disk Attachment Failures

Symptoms: “No bootable device”, “Operating system not found”, disk attachment timeout errors.

# Verify disk state
az disk show 
  --resource-group MyResourceGroup 
  --name MyVM-osdisk 
  --query "{state:diskState, owner:managedBy}"

# Reattach to VM
az vm disk attach 
  --resource-group MyResourceGroup 
  --vm-name MyVM 
  --name MyVM-osdisk

Issue 5: NSG Blocking Access (VM Boots but Appears Unreachable)

Symptoms: Boot diagnostics shows a login prompt, VM metrics show activity, but RDP/SSH fails.

# Add NSG rule for RDP
az network nsg rule create 
  --resource-group MyResourceGroup 
  --nsg-name MyVM-nsg 
  --name Allow-RDP 
  --priority 100 
  --destination-port-ranges 3389 
  --access Allow 
  --protocol Tcp

Best Practices for VM Reliability

Enable Boot Diagnostics on all VMs for screenshot and serial console access
Daily Automated Snapshots for quick recovery from any failure
Availability Zones for datacenter-level fault isolation
Azure Backup with appropriate retention policies
Managed Disks for better reliability over unmanaged disks
VM Health Alerts for availability and performance thresholds
Documented Runbooks so any team member can execute recovery procedures
Quarterly DR Drills with monthly restoration tests
Infrastructure as Code (ARM or Terraform) for consistent, repeatable deployments
Capacity Reservations for mission-critical VMs to avoid AllocationFailed errors

Security Considerations

Implement RBAC with least-privilege access and Just-In-Time VM access
Store credentials in Azure Key Vault
Use Azure Bastion for secure access without public IPs
Enable network flow logs for traffic analysis

# Enable Azure Disk Encryption
Set-AzVMDiskEncryptionExtension 
  -ResourceGroupName "MyResourceGroup" 
  -VMName "MyVM" 
  -DiskEncryptionKeyVaultUrl "https://mykeyvault.vault.azure.net/" 
  -VolumeType "All"

Performance Optimization

Dv4/Dsv4 for general purpose workloads
Ev4/Esv4 for memory-intensive applications
Fv2 for compute-optimized workloads

# Use Premium SSD for OS disk
az disk create 
  --resource-group MyResourceGroup 
  --name MyVM-osdisk-premium 
  --sku Premium_LRS 
  --size-gb 128

# Enable accelerated networking
az network nic update 
  --resource-group MyResourceGroup 
  --name MyVM-nic 
  --accelerated-networking true

Conclusion

Azure VM boot failures are almost always solvable — the key is working systematically from platform-level checks (power state, allocation, activity logs) down to OS-level repairs (BCD, kernel, disk). Enable boot diagnostics on every VM before problems occur, maintain regular snapshots, and document your recovery runbooks so your team can respond fast under pressure.

For related infrastructure challenges, see our guides on Azure VPN Gateway configuration and cloud migrations.

Professional Consulting Services

Need expert help recovering a failed VM or building a more resilient Azure infrastructure? I provide emergency VM recovery, high-availability architecture design, and Azure Backup configuration for organizations worldwide.

Contact: itexpert@navedalam.com | WhatsApp: +92 311 935 8005 | Schedule a free consultation

About the Author

Naveed Alam is a Network & Cloud Engineer with 8+ years of experience in Azure infrastructure, virtual machine management, and enterprise cloud solutions. CCNA, AZ-900, and CompTIA A+ certified. Has resolved 300+ Azure VM boot failures across industries ranging from healthcare to financial services.

Connect on LinkedIn

Ready to Build?

Let's discuss your infrastructure project

Free 30-minute consultation. No sales pressure — just an honest assessment of your network, cloud, or security needs.

Start a Conversation Chat on WhatsApp

3+Years Experience

50+Projects Delivered

5★Average Rating