Cloudify, but at what cost

It looks like we are building everything in the cloud anyway, should it be a conscious or forced decision. The style we adopt the cloud counts. My feeling is that this journey and results won’t be for all parts nice and sweet.

Cloud is complex and it kills people and services. Complexity makes fragile and insecure systems that are hard to operate and maintain. This means availability and security will be lower than should be, opposite to cloud’s promise. Cloud providers introduce new features constantly. What I see, is that cloud services build incrementally over time. Once there was a clean nice small compute service on the internet, but over decade later it has expanded to a corporate monster that little by little fulfills every feature that companies need. Now we are in a hybrid environment where every private IT function is replicated to a public cloud. This brings out the worst of all parts. From a networking perspective feels like I have seen this already. Everything is reinvented again, now in the cloud context. Cloud networking is a flashback to the 90’s, except for new branding of clumsy, cumbersome, and limited features that apparently hide complexity in some sense. Ironically every feature is a separate product and includes different kinds of integrations, billing instruments, and limits.

The complexity is pervasive. How do you operate this kind of monster setup? Obviously as Infrastructure-as-Code automation. But why does cloud still offer a huge web portal to operate all this? Reminds me a lot of Cisco ACI GUI and the frustration of getting simple things done in a decent time and effort. Console, IaC, and automation are totally different story that needs own developers, tooling, workflows, and long-term dedication and context-understanding. My conception about IaC status in an average enterprise cloud is that a lot of new standardized resources can be easily spun up with Terraform or other tools. But what about changes and removing stuff? Who understands the whole cloud platform and business logic of interconnected components? Who knows how cloud and services behave in certain change when modifying resources that other services depend on? Who has the courage to delete resources manually or even through the automation pipeline? I see a lot of testing and validation needed there, and a lot of brave spirit. Bigger architectural changes seem to be almost impossible to implement when you have started with something slightly wrong. I wonder who is in charge of holistic architectural design, governance, and compliance. Probably we are now accumulating technical debt and debris.

That’s why I doubt a lot of cloud resources are there just running idle and generating the bill. Usage-based billing means you can’t calculate or estimate expenses accurately. Capacity scaling and traffic-based fees add whole new variables to billing. I understand that companies like monthly opex, but I don’t understand how they can tolerate so unpredictable bills. This feels totally counter-intuitive for running a business. FinOps is now a thing and cloud providers offer estimation and optimization tools, and other measures to manage the bill. I see the same cycle spinning again: we get excited about something new, then we overspend on it, and soon come hangover and remorse. Over time this equation settles down and we get our productivity step up. The same thing is now going on with AI.

One special thing in the cloud is egress traffic fees. Cloud sucks all in but lets nothing out without penalizing the user hard way. This problem comes visible in hybrid and multi-cloud scenarios and connections between cloud regions. Cloud provider becomes a dictating party in architectural decisions, often for the wrong reasons. User is forced to use cloud for all possible services because it’s just way cheaper to stay inside one cloud than to use some external service or provider. Big cloud providers are simply owning too much market power and it will turn against them at some point. Sooner or later, I expect EU regulation to take control of excessive egress fees which limit competition and architectural choices. So eventually cloud pricing could change more or less.

Most cloud services are compatible inside that one cloud only. What about multi-cloud? There are just parallel clouds and a lot of overlapping proprietary services and connections just to enable certain features in certain cloud platform for certain business function. This parallelism adds complexity and needs more tools to manage it. Utilizing resources only partially is wasting them. This is also a sustainability issue. Someone else’s computer is not our business, we only see financial implications and the bill we pay, but not the actual factors behind the facade, like power usage, server counts, or space needed to run our services. We have to be more aware and responsible for the wasted resources we create.

To make cloud more efficient for our business, we should use more higher-layer services and get rid of legacy infrastructure components. Software development and applications run the world now, so let the cloud handle infrastructure complexity and optimize it in its terms. We don’t need to think of it so much if we can abstract our services to cloud-native services like PaaS, FaaS, SaaS, microservices, or vertical business services. This means giving up some decisions and committing to play with the rules of the specific platform and service. Hybrid cloud makes it hard though. We still build enterprise perimeter, private networks and services, and security around all that. This makes cloud not only complex and clumsy, but also vulnerable to cyber threats. I’d like to see the old internet-based public cloud coming back. It’s more like a SaaS service. Everything should be implemented using zero trust model where security and control are not based on network and infrastructure, but identity and applications. This could help us move away from arduous legacy infrastructure and make cloud more a modern and secure service platform.

Cloud providers must take responsibility to make their services secure and easy to use without user’s thorough specialization to every service detail. Maybe this isn’t even possible in complex IT world. Cloud users in turn must renew their infrastructure and application architectures to utilize cloud-native services. Unfortunately, average enterprises are quite far away from that. The hardest part will be the people and operation models. ITIL processes are so tightly glued to enterprise operations that it’s incredibly hard to move on to DevOps in infrastructure operations. But many organizations have taken steps towards it thanks to software development which is now part of every company. Brownfield environments are still locked to the traditional model for a long time.

The point is that lift and shift is not going to work. We need a big change, which starts from our mindset. It’s software development that now runs the IT. Cloud should be a modern platform for business-oriented services and apps. It’s about cloud-native architecture: apps, data, APIs, events, policies, abstraction, identities, zero trust model, and so on. Change is huge, and it’s a complete architectural revolution. But cloud is not going to change anything under the hood. Organizations need specialists to architect, run, manage, secure, and govern all these platforms. And technological and organizational siloes still exist. The most important thing is that organizations need to own the business technology understanding which then leads to vision and responsibility for selecting appropriate cloud services. Private cloud is still a viable option for those who have capabilities and special needs. But public cloud evolution is ruthless with ever-expanding offerings. It’s really hard to resist that. The big crowd goes with the flow, therefore it’s important to stop, understand the big picture, and think what is the best way to execute your own business vision at IT platform level.

2 thoughts on “Cloudify, but at what cost

  1. Cheers! I’m so pleased I stumbled across this blog post – it’s been a real eye opener and also provided me with a lot of new knowledge. Thank you for sharing your knowledge!
    Cloud complexity, fragility, and insecurity can result in lower availability and security, contrary to the promises of the cloud. Cloud services continually add new features, resulting in a hybrid environment that can be challenging to manage. Infrastructure-as-Code automation is necessary, but cloud providers still offer web portals that can be frustrating to operate. Changes and removals in the cloud require extensive testing and validation. Cloud billing can be unpredictable, and egress traffic fees can be costly. Multi-cloud environments add complexity and waste resources. To optimize cloud efficiency, businesses should utilize higher-layer services and cloud-native services. Cloud security and ease of use should be a responsibility of the cloud providers. Businesses need to update their architecture and adopt cloud-native services, but many organizations are still lagging behind. A mindset shift is needed to embrace cloud-native architecture. Cloud organizations require specialized professionals and should have a deep understanding of their business technology to select appropriate cloud services. Private cloud remains an option, but public cloud offerings are expanding rapidly.
    Wayne

Leave a Reply