This microblog was released in Twitter thread.
Cisco ACI feels so robust, complex and pricey that you might think it’s for big networks. Actually, there are many scalability limitations that you might encounter sooner or later.
Mode: First there is a decision between Multi-pod and Multi-site. Multi-pod is single distributed fabric and easier to understand. Multi-site makes ACI more robust with separate fabrics but adds management complexity with Nexus Dashboard Orchestrator layer.
Encryption: Multi-site adds Cloudsec option which means end-to-end VXLAN tunnel encryption between sites and their spines. Multi-pod can’t do Cloudsec but both can use link-level macsec encryption.
Multi-pod spines can be connected back-to-back without IPN routers. Then macsec would encrypt whole VXLAN tunnels between spines. But this limits fabric to two pods only.
Remote-leaf: is add-on to one fabric. It works fine with Multi-pod. Multi-site can extend only one fabric (site) to remote-leaf. This means multi-site stretched vlans can’t be extended to remote-leaf directly.
L2 scale multi-site: If you wanted to stretch L2 bridge-domains between sites Multi-site with NDO limits stretched BDs to 6000. Contracts between site-specific EPGs create shadow EPGs which might cut the total number even smaller.
L2 scale fabric: can have 15000 BDs but each leaf limits vlan scale to 1980 BDs per leaf. That’s surprisingly low number. Leafs have local vlan scope so different leafs can add 1980 BDs each. Legacy mode can make it double but limits the use of other app-centric and L3 features.
Switch capacity: Leaf can have enormous capacity with 36-64 100G ports. That means you don’t need many switches in medium size networks. Dedicating powerful switches to different roles like service, compute and border leaf is often wasting resources and money.
Vlan scale: VXLAN often gives the impression that vlan scale is automatically multiplied to 16M. Well, often the case is much more complex. For starters, ACI fabric can support 15000 BDs total. That’s just a fraction of VXLAN scale though.
Overlapping vlans: can be done but needs more complex policies and mapping specific vlans to specific resources. Shared external services for all customers are common. That means vlans are terminated to one common point and this can make overlapping vlans even harder to handle.
VMM integration: can help a lot with hypervisors. Dynamic vlan assignment is handy when EPGs are created on the port-group level automatically. If the hypervisor was not supported by VMM that’s a total bummer. You lose a lot of ACI capabilities.
App-centric mode: ACI scalability is based on application-centric modern approach where basic vlan stretching can be avoided. But the question is can the user adapt brownfield network to this new model. Looks like it’s very hard and many continue forever with traditional L2 network-centric fabric.
Routing: App-centric model utilizing EPGs, ESGs, contracts, microsegmentation, service-graphs etc. requires ACI to be anycast-gw for EPGs. Routing needs to be done in ACI and that’s a more tedious job. ACI routing has pretty comprehensive features but obviously has limits also.
L2 need: You may ask why anyone would do heavy L2 stretching between sites and need thousands of vlans everywhere. Because customers are usually old-school and have the requirement of distributing their HA setup between two different sites in metro area.
Cloud Availability Zone: Multi-pod is better for stretched BDs and Multi-site for two independent sites with different subnets. Multi-pod is good fit for one site and multiple pods. It can be distributed to multiple sites also.
Cloud Region: Multi-site actually needs multi-pod inside one site if you wanted to do L2 high availability. Then Multi-pod is for L2 availabity zones inside one site and Multi-site forms region with multiple L3 separated independent availabilty zones like public cloud.
Price: The conclusion is that ACI can scale but you have to buy many more components around the basic setup and build your network using ACI-supported architectures. This could mean heavy lifting and investigating platform limits for proper design.