There are four essential elements to consider when developing a governance structure for your cloud journey. Failure to address these points frequently leads to a variety of pains that are difficult to undo. These four elements are as follows:
• Subscriptions matter.
• The network has to come first.
• Security is essential.
• Automation is required
Subscriptions Matter
The fundamental container of resources in Azure is the subscription. How many subscriptions do you need? Start with three and grow beyond that based on these conditions:
• Subscription capacity is exhausted.
• Acquisition and ownership (not just management) of Azure resources takes place in multiple geographical/political/regulatory jurisdictions.
• The “thing” being deployed to Azure is part of your company’s “cost of goods sold.”
This works for most companies. The first subscription is Production, where no standing security access exists (except for your CI/CD runners) outside of Reader roles. The second is Not Production. This subscription is where coor‐ dinated nonproduction tiers exist (Dev, Test, Int, Stage, PreProd), with an increasing security posture as the tier level approaches production. The third is your Hub subscription, where core networking, ExpressRoute circuits, etc. are housed and heavily restricted. Visual Studio subscriptions should be pro‐ vided to developers and IT pros for them to learn and do playground-based work. Keep a tight policy lock on these to prevent data exfiltration. When the developer or IT pro is ready to integrate with others, they move into the con‐ trolled Not Production subscription.
The Network Has to Come First
You cannot govern the cloud without a stable network topology. No amount of serverless or PaaSification of your environment eliminates the need for proper design, operation, and control of networking. These designs tell your application how to operate securely, fail over, and survive a data loss. Ignor‐ ing the network is a tragic and expensive mistake. You need not use huband-spoke routing and record all traffic. Other solutions exist that better lend themselves to modern network security and intrusion/breach preven‐ tion than forced tunneling, like Azure Security Center, Monitor, and Advisor for detailed, live introspection on what is happening in your environment in a correlated manner.
Security Is Essential
Implementing least privilege access and regular account reviews is essential. RBAC, principally applied at the resource group level via automation with zero standing access to production, helps prevent a whole host of unwanted experiences. Adopting an “assume breach” stance for everything allows you to focus your data protection efforts where they matter most: in the source system. Subscription-level access should be limited to your automation accounts, break-glass account, and audit solutions (read-only). Privileged identity management and multifactor authentication (MFA) for sensitive operations should be the norm. The need to “see” things via the portal or CLI should diminish the closer you get to production. For example, it is not necessarily true that the SQL administrators need full permissions to the resource groups where the SQL servers are located. Perhaps they need only Reader rights, or no rights except to the emitted logs. You should never allow a change to production absent automation. If you do, your disaster recovery strategy is invalid.
Automation Is Required
You cannot effectively manage the cloud via the portal. You cannot effec‐ tively govern or secure the cloud without automation. There are simply too many moving parts, too many places to make a mistake, and far too many neat little buttons to push. Starting in your development tier, teams should have portal access to help craft automation scripts that include not just the application, but also the infrastructure and the configuration. As you move closer to production, the rights should be reduced at each tier until nothing but access to the emitted logs remains. This final step is hard and represents an ongoing journey. Reasons will always exist for deviation from automa‐ tion, but those should be backported to your CI/CD pipelines to return your environment to an automated state for deployments, monitoring, and recovery.