We spun out a couple of cloud solutions for WFH.
The first thing was capacity in the regions we are allowed to use. Restrictions applied on the types of compute instances for instance, because of lack of capacity in those regions.
The second element was performance. You really do have to pay to get it. Or you Netflix it up. Spin up an instance, performance test it, if it doesn't come up to spec, kill the instance and spin up another one.
Then there's orchestration. We are getting that down to a fine art for our on-prem infrastructure. Even installing servers is trivial. Stack, rack, push configuration. We automated the cloud configuration and the sheer number of API calls we finished up with was surprising. And we found a lot of inconsistencies in the APIs, adding to the complexity and the weight of it. We evaluated a few tools to help, but they fell into two camps. In one camp they are all singing and all dancing, but carry some fairly stifling assumptions/constraints. In the other, they are much more flexible but you have to do a lot more work yourself. So we wrote our own, perhaps niaively underestimating the number of moving parts we would end up with.
I think the third surprise was how long provisioning of some services can take. Many of the APIs are asynchronous and if you have dependencies, your orchestration has to sit their polling endpoints until provisioning tasks are complete.
I think the final niggle is that many of the services and appliances available, I feel, have yet to reach maturity. You kind of end up having to choose in many cases between a service, an appliance or just getting a compute instance and provisioning the service yourself. So that leads to a lot of evaluation/testing/trial/error.
In terms of cost, we regularly compare our on-prem costs to equivalent cloud hosting costs. Colo fees for on-prem are not cheap. Hardware is not cheap. There are costs to in testing hardware, driving to data centres and support contracts. But cloud hosting always comes out more expensive. Way way more expensive.
We have fairly comprehensive monitoring of our cloud infrastructure, from our data centres and other locations. The connectivity into the data centres is much more reliable. The number of timeouts in simple checks to cloud hosted services I see on a day to day basis vastly out number those I see for our on-prem end points. And less than 5% of our infrastructure is cloud hosted.
Cloud is very useful for some things. CDNs. Messaging. Queues. Things you have reasons for not hosting on your own infrastructure. Things on which latency and performance don't matter. Geo-redundant storage of backups. Spiking out ideas. Start-ups with little capital, etc..