Implementing Cost Optimization for a Leading Cement Manufacturer in India

November 21, 2025

Problem Statement:

The GCP environment faces cost inefficiencies and performance bottlenecks due to a range of misconfigurations and operational oversights. Cost-related issues dominate, such as idle or forgotten resources, high logging/storage overheads, and unoptimized BigQuery usage may lead to high cloud spend. Additionally, architectural inefficiencies—like ineffective data partitioning, poorly designed queries, and unnecessarily activated services—further inflate bills without adding value. On the performance side, null value propagation, complex merge queries, and overconsumption of slots hinder pipeline reliability and query execution. Together, these challenges demand a proactive governance model combining automation, monitoring, and architectural best practices to ensure scalable and cost-efficient GCP operations.

Major issues reported after the AWS to GCP migration

High Cloud Cost-

The following cost-related issues were identified during the assessment:

High Storage Cost
High Cloud Logging Cost
High Cloud Composer Cost /High Cloud Run Cost
High Compute Engine cost
High DataProc Cost

Other Challenges-

Application / Pipeline Not Running Properly due to null values in source data
Null Values Causing Extra Cost

Detailed overview of Challenges and approach to fix the issues:

1. High Cloud Cost
Challenge	Description	Probable Causes	Impact	Mitigation Approach
High Storage Cost	Excessive Cloud Storage usage due to idle resources, old/unused files, and duplicate datasets	No lifecycle rules or archival Not using Nearline/Coldline storage Poor residual management Uncompressed CSV/JSON formats Manual provisioning	Elevated GCP costs Wasted resources Higher BigQuery storage/query costs	Implement storage class tiers (Nearline, Coldline) Apply lifecycle rules for archival Compress data (GZIP) Use Parquet/Avro formats Auto-shutdown scripts for VMs/Dataproc Enable idle cluster detection
High Cloud Logging Cost	Excessive log ingestion and long retention periods	Default retention settings DEBUG level logging unnecessarily	Costs increase with GBs of logs stored	Set custom retention periods Use log exclusion filters Route logs to BigQuery with partitioning Filter out noisy logs (health checks)
High Cloud Composer / Cloud Run Cost	Cloud Composer has high base pricing; containers running but unutilized	Composer environments always running Multiple environments for similar jobs Cloud Run not scaled to zero High memory/CPU allocations Minimum instances > 0	Costs accrue even when idle Pay for unused compute/memory Increased runtime costs	Use Cloud Scheduler to pause/resume environments Consolidate DAGs Configure Cloud Run to scale to zero Implement auto-scaling policies Optimize container logic
High Compute Engine Cost	VMs running beyond required capacity or lifecycle	Instances left running Over-provisioned VMs Lack of automation Persistent disks not deleted VMs in premium zones	Elevated monthly compute costs Inefficient resource utilization Budget overruns	Implement VM lifecycle automation Right-size VMs using Recommender API Use instance schedules for dev/test Delete unused disks/snapshots Prefer standard zones Enforce labels and budget alerts
High DataProc Cost	Dataproc clusters running continuously at full scale	Always-on mode No auto-decommissioning	Large compute and storage bills	Enable auto-cluster termination Use single-node clusters (dev/testing) Schedule shutdowns Move batch workloads to BigQuery/Dataflow

2. Other Challenges
Challenge	Description	Causes	Impact	Mitigation Approach
Pipeline Failures Due to Null Values	Pipelines fail or behave unpredictably when nulls appear in critical fields	Unclean source data ingestion Lack of schema enforcement No null checks during transformation	Job failures triggering retries Higher execution costs Unreliable analytics & reporting	Add null checks at ingestion (Dataflow/Dataproc) Validate incoming files with Cloud Functions Store only validated data in BigQuery
Null Values Causing Extra Cost	Nulls increase processing cost and complicate filtering/joining operations	Poor data quality at source No filters to exclude nulls during preprocessing	More scanned bytes → higher BigQuery cost Wasted slot time Incorrect results requiring re-runs	Apply WHERE field IS NOT NULL filters Use Python in Airflow to clean data Cleanse data in Dataflow pipelines Optimize with partition filters & column pruning

Conclusion

With the right optimizations, automation and governance, the company can reduce unnecessary GCP costs and improve overall performance. Addressing misconfigurations and pipeline issues ensures a more stable, efficient and scalable cloud environment going forward.