Cisco ACI day 2: Real life design and operational issues

This post was originally posted as Twitter thread 15.12.2020

ACI journey continues. Now I’m involved in real life practical designs and hands-on deployments. Thoughts on design and issues I have experienced.

Resource objects are still hard to understand properly. Why this and that, why so many pieces? What is the best combination of vlan pools, domains, interface profiles, etc. for your own environment? Cisco has some basic best practices but a deeper understanding is still lacking.

I’d like to see a more detailed and backgrounded ACI design guide to understand the options and consequences of choices. Any pointers to blogs or other resources covering these topics?

Resource creation is important because it’s the foundation of network connectivity. Changing fundamental settings later is hard and time consuming. E.g. we had to change the static VMM vlan pool to dynamic which was quite an easy task but still a disruptive operation.

Also considered to migrate FW from single ports to VPCs to make it more redundant. But the idea was discarded because the change would have been so hard to do in production on short notice. Modifications need well-designed config changes and ACI operations. MW could be longer than usual.

Integrating physical FW and ADC appliances to customer tenant was one design issue. Initially, we thought Service Graph would be a modern flexible way to do it. Appeared it is a complex set of configurations and had no real benefits in this case.

Service Graph uses one-armed FW and PBR and I was afraid we will encounter a use case where this is a show stopper. At least we would be confused by all PBR policies and contracts eventually. PBR sounds always bad for my ears but maybe I’m too old to get used to it.

Service Graph configuration was not clearly documented so that I could understand what I’m doing in the short and long term. I found this video the most informative example: https://youtu.be/ryNmeVFYpF0.

After trying out this Service Graph and thinking pros and cons, we discarded it and went to traditional routed VRF-sandwich. It isn’t the fanciest but well-known and working solution relying on traditional routing.

This drove us to use even more VRFs to isolate routing domains like DMZ networks. And this means more transit links and routes between ACI VRF and FW appliance. But it’s just simple config repetition and easy routing.

Why not use OSPF between ACI and FW? We thought it but it didn’t make sense to run OSPF on the LAN segment with four routers. Links should have been P2P but changing them would have been too laborious to bother.

Again, blame your original design and we are back to square one with our initial choices. The lesson is to design the whole system and service model properly in advance.

Contracts are one part of traffic policies. Just saying they are one more level of complexity. Better to use allow-any type contacts where needed in network-centric mode. Still different directions and levels of apply points exist between tenants, VRFs, L3outs and EPGs.

I’ve been thinking about this other new deployment case and how to proceed step by step. Eventually, it’s going to be app-centric but getting there would take several years or never will be completed. So initial step probably must be the network-centric migration.

Network-centric migration means the exponential amount of changes in the long run when you first migrate the network one-to-one and then rearrange all servers, apps and network services.

Hard work is figuring out all systems, applications, and their components and relations in your environment. When you have clear documentation and design goals, ACI configuration is possible.

I’m convinced that an automation tool would be helpful from scratch. One good way to build ACI is to take API first strategy and model and code all configurations outside the ACI.

I slightly touched ACI security and hardening also. I didn’t find much information about how ACI is secured and hardened on the technical level. I assume security is mostly built-in and I can rely on it.

The most important part is to isolate management access and functions from public and customer-facing networks. CoPP is on by default and offers pre-defined levels strict/medium/permissive. As always you must know your protocols and fine-tune pps levels respectively.

You can use basic rate-limiting (DPP) and port-security to limit access port traffic. Storm-control is also available.

More detailed third party ACI security and risk assessments are provided by ERNW papers: ERNW_Whitepaper68_Vulnerability_Assessment_Cisco_ACI_signed.pdf and TROOPERS19_DM_Threat_Modelling_Cisco_ACI.pdf

Leave a Reply