From Local Inference to Hybrid Cloud: Setting Up Ollama on Apple Silicon and Connecting It to GCP
This post covers Phase 1 and Phase 2 of my hybrid AI inference platform. Phase 1 gets local inference running on a Mac Mini. Phase 2 builds the cloud side with Terraform and connects the two through a private network. By the end, a GCP VM in us-central1 is calling an AI model on my Mac Mini in Lincoln, UK, through an encrypted tunnel. Part 1: Local Inference Why local inference matters Cloud AI inference is powerful but expensive. Every API call to Vertex AI or OpenAI costs money. For straightforward tasks like summarisation, Q&A, and code explanation, a smaller model running on your own hardware handles the job without the bill. ...