Wednesday, August 6, 2025

SNOWFLAKE Interview Questions 2025

 

1️ Choosing the Perfect Size of Virtual Warehouse

To select the right size (X-Small to 6X-Large), consider:

  • Data Volume & Complexity: Larger datasets or complex joins/aggregations may require Medium or Large warehouses.
  • Concurrency Needs: More users or parallel queries? Scale up or use multi-cluster warehouses.
  • Performance SLAs: If low latency is critical, opt for larger sizes or auto-scaling.
  • Cost vs. Speed Trade-off: Start small, monitor query performance via Query Profile, and scale only if needed.

🧠 Tip: Use Query History and Warehouse Load Charts to analyze CPU usage and queue times before resizing.

2️ Optimizing Storage & Compute Cost in Snowflake

Storage Optimization:

  • Use data compression (Snowflake does this automatically).
  • Archive unused data to lower-cost storage tiers.
  • Drop unused tables/stages and purge Fail-safe data when possible.

Compute Optimization:

  • Use auto-suspend and auto-resume for warehouses.
  • Schedule jobs during off-peak hours.
  • Use result caching and materialized views for repetitive queries.
  • Avoid over-provisioning; monitor warehouse usage and scale down if underutilized.

3️ Checklist for On-Prem to Snowflake Migration

  • Source System Analysis: Understand schema, data types, volume, and dependencies.
  • Data Quality Checks: Nulls, duplicates, referential integrity.
  • Transformation Logic: Map ETL logic to Snowflake-compatible SQL or ELT.
  • Security & Compliance: Identify PII, encryption needs, access controls.
  • Performance Benchmarking: Compare query performance pre/post migration.
  • Tooling: Choose between ADF, Informatica, or custom scripts for ingestion.
  • Validation Strategy: Row counts, checksums, sample data comparison.

 

 

 

 

 

4️ Clustering vs. Search Optimization Service

 

Feature

Clustering

Search Optimization

Purpose

Improve query performance on large tables

Accelerate point lookup queries

Use Case

Range scans, filtering on clustered columns

Fast retrieval on high-cardinality columns

Maintenance

Manual or automatic clustering

Fully managed by Snowflake

Cost

Storage overhead due to clustering metadata

Additional cost for optimization service

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

🧠 Use Clustering for large fact tables with predictable filters. Use Search Optimization for fast lookups on semi-structured or sparse data.

5️ Query Performance Optimization Techniques

  • 📌 Use Query Profile to identify bottlenecks.
  • 📌 Avoid SELECT *; project only needed columns.
  • 📌 Use materialized views for expensive aggregations.
  • 📌 Partition large tables using clustering keys.
  • 📌 Leverage result caching and CTAS for intermediate steps.
  • 📌 Rewrite subqueries as joins or CTEs for better optimization.
  • 📌 Ensure proper data types and avoid implicit conversions.

6️ Query to Check Table Usage in Views

sql

SELECT table_name, view_name

FROM information_schema.view_table_usage

WHERE table_name = 'YOUR_TABLE_NAME';

This checks if the table is referenced in any view definitions within the current database.

7️ Implementing SCD Type 2 in Snowflake

Use a combination of MERGE and metadata columns:

sql

MERGE INTO target_table AS tgt

USING staging_table AS src

ON tgt.business_key = src.business_key AND tgt.is_current = TRUE

WHEN MATCHED AND tgt.hash_diff != src.hash_diff THEN

  UPDATE SET tgt.is_current = FALSE, tgt.end_date = CURRENT_DATE

WHEN NOT MATCHED THEN

  INSERT (business_key, attribute1, attribute2, start_date, end_date, is_current)

  VALUES (src.business_key, src.attribute1, src.attribute2, CURRENT_DATE, NULL, TRUE);

  • is_current: Flag for active record
  • start_date, end_date: Track validity
  • hash_diff: Detect changes efficiently

8️ COPY from Date-wise Folder in S3

Assuming folder structure like s3://bucket/yyyy-mm-dd/file.csv:

sql

DECLARE current_date STRING;

SET current_date = TO_CHAR(CURRENT_DATE, 'YYYY-MM-DD');

 

COPY INTO my_table

FROM @my_s3_stage/$(current_date)/file.csv

FILE_FORMAT = (TYPE = 'CSV');

You can also use external tables or ADF with dynamic path resolution.

9️ Create Table from Parquet File in S3 (Without Stage Browsing)

sql

CREATE OR REPLACE TABLE parquet_table

USING TEMPLATE (

  SELECT *

  FROM TABLE(

    INFER_SCHEMA(

      LOCATION => '@my_s3_stage/path/to/file.parquet',

      FILE_FORMAT => 'PARQUET'

    )

  )

);

This uses INFER_SCHEMA to auto-generate the table structure.

🔟 Identifying PII/Sensitive Data

  • 📌 Use Snowflake's TAGS and CLASSIFY function (if enabled).
  • 📌 Scan column names for patterns like email, ssn, dob, credit_card.
  • 📌 Use data profiling tools (e.g., Great Expectations, Collibra).
  • 📌 Implement column-level masking policies.
  • 📌 Maintain a PII inventory and enforce RBAC.

 

 

 

Basic Questions

  1. What is Snowflake, and how does it differ from traditional databases? Snowflake is a cloud-native data platform that separates compute, storage, and services. Unlike traditional databases, it offers scalability, elasticity, and native support for semi-structured data without infrastructure management.
  2. What are the key features of Snowflake?
    • Separation of compute/storage/services
    • Time Travel & Fail-safe
    • Native support for JSON, Avro, Parquet
    • Secure data sharing
    • Multi-cloud support
    • Auto-scaling virtual warehouses
  3. How does Snowflake handle data storage? Data is stored in compressed, columnar micro-partitions. Storage is centralized and decoupled from compute.
  4. What is Snowflake’s architecture? Three layers:
    • Storage: Centralized, compressed micro-partitions
    • Compute: Virtual warehouses (MPP clusters)
    • Services: Metadata, optimization, security
  5. What are Snowflake’s advantages over other cloud data warehouses?
    • True separation of compute and storage
    • Native semi-structured data support
    • Zero management overhead
    • Cross-cloud replication and sharing
  6. What is the role of virtual warehouses in Snowflake? They provide isolated compute resources for queries and ETL. Can be scaled and run concurrently.
  7. How does Snowflake support multi-cloud environments? Snowflake runs on AWS, Azure, and GCP, allowing cross-cloud replication and failover.
  8. What is Snowflake’s pricing model? Pay-per-use model based on compute credits and storage. Editions include Standard, Enterprise, Business Critical.
  9. What types of workloads can Snowflake handle?
    • Data warehousing
    • Data lake
    • Real-time analytics
    • Machine learning
    • Data sharing
  10. What is Snowflake’s approach to security?
    • End-to-end encryption
    • Role-based access control
    • Network policies
    • Data masking and governance
  11. How does Snowflake handle structured and semi-structured data? Supports VARIANT data type for JSON, Avro, Parquet. Can query and transform using SQL.
  12. What is Snowflake’s marketplace? A platform to discover and share datasets, applications, and services across Snowflake accounts.
  13. What is Snowflake’s data sharing feature? Enables secure, real-time sharing of data across accounts and clouds without data movement.
  14. How does Snowflake ensure high availability? Built-in redundancy across availability zones. Automatic failover and replication.
  15. What is Snowflake’s caching mechanism? Results caching, metadata caching, and warehouse-level caching improve performance.
  16. How does Snowflake handle concurrency? Multi-cluster warehouses scale automatically to handle concurrent users.
  17. What is Snowflake’s role in data analytics? Acts as a central platform for BI, ML, and predictive analytics with high performance and scalability.
  18. What are Snowflake’s supported programming languages? SQL, Python (via Snowpark), Java, Scala, JavaScript (for stored procedures)
  19. How does Snowflake integrate with third-party tools? Native connectors for Power BI, Tableau, Informatica, dbt, Airflow, and more.
  20. What is Snowflake’s approach to disaster recovery? Cross-region replication, failover groups, and Time Travel ensure resilience.

 

 

 

Technical Questions

  1. What is Snowflake Time Travel, and how does it work? Allows querying historical data (up to 90 days). Useful for recovery, auditing, and debugging.
  2. How does Snowflake handle schema changes? Supports dynamic schema evolution and zero-downtime DDL operations.
  3. What is Snowflake’s clustering mechanism? Clustering keys define logical sort order to improve pruning and query performance.
  4. How does Snowflake optimize query performance?
    • Pruning micro-partitions
    • Result caching
    • Materialized views
    • Query profiling and tuning
  5. What is Snowflake’s micro-partitioning feature? Data is automatically divided into micro-partitions (~16MB) for efficient scanning and pruning.
  6. How does Snowflake handle data compression? Columnar storage with automatic compression reduces storage and improves performance.
  7. What is Snowflake’s role in ETL processes? Supports ELT workflows with SQL, Snowpark, and integration with ETL tools.
  8. How does Snowflake handle JSON and XML data? VARIANT data type allows storage and querying of semi-structured formats using SQL functions.
  9. What is Snowflake’s COPY command? Loads data from external stages (S3, Azure Blob, GCS) into Snowflake tables.
  10. How does Snowflake handle data ingestion?
    • Batch: COPY command
    • Streaming: Snowpipe
    • CDC: Streams and Tasks
  11. What is Snowflake’s fail-safe mechanism? Provides 7-day recovery window after Time Travel expires for disaster recovery.
  12. How does Snowflake handle role-based access control? Roles are assigned privileges; users inherit access via roles. Supports role hierarchy.
  13. What is Snowflake’s approach to indexing? No traditional indexes. Uses metadata and clustering for performance.
  14. How does Snowflake handle workload isolation? Separate virtual warehouses ensure isolated compute for different workloads.
  15. What is Snowflake’s approach to metadata management? Metadata is stored in the services layer and used for optimization, governance, and lineage.
  16. How does Snowflake handle stored procedures? Supports JavaScript and SQL-based stored procedures with control flow logic.
  17. What is Snowflake’s approach to query optimization? Uses cost-based optimizer, pruning, caching, and statistics for efficient execution.
  18. How does Snowflake handle materialized views? Stores precomputed results for faster querying. Automatically refreshed.
  19. What is Snowflake’s approach to data governance? Tags, masking policies, access history, and RBAC support compliance and control.
  20. How does Snowflake handle replication across regions? Replication groups and failover groups enable cross-region data availability.

 

Thursday, April 3, 2025

Thursday, March 20, 2025

SQLSERVER and ORACLE IMPORTANT Queries

 SQLSERVER and ORACLE IMPORTANT 


EXEC sp_columns 'dbo.tablename';

EXEC sp_help 'dbo.tablename';

SELECT column_name FROM information_schema.columns WHERE table_name = 'tablename' AND table_schema = 'ETL'

SELECT DB_NAME() AS CurrentDatabaseName;

SELECT name AS CurrentDatabaseName

FROM sys.databases

WHERE database_id = DB_ID();

oracle:-

SELECT DBMS_METADATA.get_ddl ('TABLE', 'tablename','HR') from dual;


SNOWFLAKE Interview Questions 2025

  1️ ⃣ Choosing the Perfect Size of Virtual Warehouse To select the right size (X-Small to 6X-Large), consider: Data Volume ...