+91 955 582 1832 

Implementing Cost Optimization for a Leading Cement Manufacturer in India

  • Home
  • arrow-right-1
  • Blog
  • arrow-right-1
  • Implementing Cost Op

Implementing Cost Optimization for a Leading Cement Manufacturer in India

November 21, 2025

Problem Statement:

The GCP environment faces cost inefficiencies and performance bottlenecks due to a range of misconfigurations and operational oversights. Cost-related issues dominate, such as idle or forgotten resources, high logging/storage overheads, and unoptimized BigQuery usage may lead to high cloud spend. Additionally, architectural inefficiencies—like ineffective data partitioning, poorly designed queries, and unnecessarily activated services—further inflate bills without adding value. On the performance side, null value propagation, complex merge queries, and overconsumption of slots hinder pipeline reliability and query execution. Together, these challenges demand a proactive governance model combining automation, monitoring, and architectural best practices to ensure scalable and cost-efficient GCP operations.

Major issues reported after the AWS to GCP migration

High Cloud Cost-

The following cost-related issues were identified during the assessment:

  • High Storage Cost
  • High Cloud Logging Cost
  • High Cloud Composer Cost /High Cloud Run Cost
  • High Compute Engine cost
  • High DataProc Cost

Other Challenges-

  • Application / Pipeline Not Running Properly due to null values in source data
  • Null Values Causing Extra Cost 

Detailed overview of Challenges and approach to fix the issues: 

1. High Cloud Cost
Challenge Description Probable Causes Impact Mitigation Approach
High Storage Cost Excessive Cloud Storage usage due to idle resources, old/unused files, and duplicate datasets
  • No lifecycle rules or archival
  • Not using Nearline/Coldline storage
  • Poor residual management
  • Uncompressed CSV/JSON formats
  • Manual provisioning
  • Elevated GCP costs
  • Wasted resources
  • Higher BigQuery storage/query costs
  • Implement storage class tiers (Nearline, Coldline)
  • Apply lifecycle rules for archival
  • Compress data (GZIP)
  • Use Parquet/Avro formats
  • Auto-shutdown scripts for VMs/Dataproc
  • Enable idle cluster detection
High Cloud Logging Cost Excessive log ingestion and long retention periods
  • Default retention settings
  • DEBUG level logging unnecessarily
  • Costs increase with GBs of logs stored
  • Set custom retention periods
  • Use log exclusion filters
  • Route logs to BigQuery with partitioning
  • Filter out noisy logs (health checks)
High Cloud Composer / Cloud Run Cost Cloud Composer has high base pricing; containers running but unutilized
  • Composer environments always running
  • Multiple environments for similar jobs
  • Cloud Run not scaled to zero
  • High memory/CPU allocations
  • Minimum instances > 0
  • Costs accrue even when idle
  • Pay for unused compute/memory
  • Increased runtime costs
  • Use Cloud Scheduler to pause/resume environments
  • Consolidate DAGs
  • Configure Cloud Run to scale to zero
  • Implement auto-scaling policies
  • Optimize container logic
High Compute Engine Cost VMs running beyond required capacity or lifecycle
  • Instances left running
  • Over-provisioned VMs
  • Lack of automation
  • Persistent disks not deleted
  • VMs in premium zones
  • Elevated monthly compute costs
  • Inefficient resource utilization
  • Budget overruns
  • Implement VM lifecycle automation
  • Right-size VMs using Recommender API
  • Use instance schedules for dev/test
  • Delete unused disks/snapshots
  • Prefer standard zones
  • Enforce labels and budget alerts
High DataProc Cost Dataproc clusters running continuously at full scale
  • Always-on mode
  • No auto-decommissioning
  • Large compute and storage bills
  • Enable auto-cluster termination
  • Use single-node clusters (dev/testing)
  • Schedule shutdowns
  • Move batch workloads to BigQuery/Dataflow

2. Other Challenges
Challenge Description Causes Impact Mitigation Approach
Pipeline Failures Due to Null Values Pipelines fail or behave unpredictably when nulls appear in critical fields
  • Unclean source data ingestion
  • Lack of schema enforcement
  • No null checks during transformation
  • Job failures triggering retries
  • Higher execution costs
  • Unreliable analytics & reporting
  • Add null checks at ingestion (Dataflow/Dataproc)
  • Validate incoming files with Cloud Functions
  • Store only validated data in BigQuery
Null Values Causing Extra Cost Nulls increase processing cost and complicate filtering/joining operations
  • Poor data quality at source
  • No filters to exclude nulls during preprocessing
  • More scanned bytes → higher BigQuery cost
  • Wasted slot time
  • Incorrect results requiring re-runs
  • Apply WHERE field IS NOT NULL filters
  • Use Python in Airflow to clean data
  • Cleanse data in Dataflow pipelines
  • Optimize with partition filters & column pruning