Oracle AI Data Platform Workbench Samples

This repository contains a curated collection of sample notebooks demonstrating how to build data pipelines, run machine learning workloads, and integrate AI capabilities using Oracle AI Data Platform (AIDP) Workbench — a unified, governed workspace for data engineering, ML, and AI development powered by Apache Spark.

What is Oracle AI Data Platform Workbench?

Oracle AI Data Platform Workbench is a unified, governed workspace for building, managing, and deploying AI and data-driven solutions. It brings together notebooks, agent development, orchestration, and catalog management in a single collaborative platform — empowering teams to explore data, fine-tune models, and operationalize AI with trust and speed.

Learn more about AIDP Workbench →

---

Repository Structure

oracle-aidp-samples/
├── getting-started/          # Foundational notebooks for new users
│   ├── Delta_Lake/           # Delta Lake feature walkthroughs
│   └── migration/            # Migrating workloads to AIDP
├── data-engineering/
│   ├── ingestion/            # Connectors and data loading patterns
│   └── transformation/       # Pipeline architectures and table formats
│       ├── liquid-clustering/
│       ├── medallion-lake/
│       ├── scd/
│       └── streaming/
├── ai/
│   ├── agent-flows/          # Agent orchestration and scheduling
│   └── ml-datascience/       # ML, LLM, and AI service integrations
└── shared-utils/             # Reusable utilities and data generators

---

Sample Catalog

Getting Started

Foundational examples to help you get up and running on AIDP Workbench.

| Notebook | Description | |---|---| | Access ALH Data | Write and query data in Oracle Autonomous AI Lakehouse (ALH) using PySpark insertInto and SQL INSERT statements with external catalogs. | | Access Object Storage Data | Read and write data from OCI Object Storage using direct access, external volumes, and external tables. | | Analyse Data Using PySpark | PySpark fundamentals: catalog and schema setup, table creation, data insertion, schema exploration, and matplotlib visualizations. | | Analyse Data Using SQL | Core SQL operations on AIDP including DataFrame creation, transformations, aggregations, and simple visualizations. | | ALH External Catalog MERGE | End-to-end MERGE workflow into an ALH table via an AIDP external catalog: insert/update/delete with merge keys and OOS-staging skip optimization. |

Delta Lake

| Notebook | Description | |---|---| | Use Delta Lake Table | Comprehensive guide covering Delta table operations: updates, merges, time travel, liquid clustering, and vacuuming. | | Delta Change Data Feed | Capture row-level changes (inserts, updates, deletes) from Delta tables for CDC, incremental processing, and streaming pipelines. | | Handle Schema Evolution | Add and evolve columns in Delta tables without rewriting existing data, leveraging automatic schema evolution. | | Delta UniForm Tables | Create Delta UniForm tables that automatically synchronize Iceberg metadata for cross-format interoperability. |

Migration

| Notebook | Description | |---|---| | Migrate Files from Databricks to AIDP | Recursively export notebooks and files from a Databricks workspace to AIDP using the databricks-sdk library. | | Download from Git to AIDP | Download notebooks and files from a Git repository as a ZIP archive and extract them directly into an AIDP workspace volume. |

---

Data Engineering — Ingestion

Patterns for connecting to and loading data from a wide range of sources.

| Notebook | Description | |---|---| | Read/Write Oracle Ecosystem Connectors | Connect to Oracle Database, Oracle Exadata, ALH, and ATP with external catalog support and SQL pushdown. | | Read/Write External Ecosystem Connectors | Read/write operations with Hive Metastore, Microsoft SQL Server, PostgreSQL, and MySQL. | | Read-Only Ingestion Connectors | Use read-only connectors for MySQL HeatWave, REST APIs, Oracle Fusion BICC, Kafka, and other sources. | | Connect Using Custom JDBC Driver | Integrate custom JDBC drivers (e.g., SQLite, Snowflake) with Spark for connecting to databases not bundled by default. | | Execute Oracle ALH SQL | Execute SQL statements directly against Oracle ALH using the oracledb Python package. | | Ingest Data Using YAML | Config-driven ingestion from cloud storage (CSV, JSON) and JDBC sources with schema validation and data quality checks. | | Ingest from Multi-Cloud | Ingest data from Azure Data Lake Storage (ADLS) and AWS S3 with proper JAR configuration and credential management. | | Ingest into Apache Iceberg (OCI Native) | End-to-end Apache Iceberg workflow: table creation, querying, schema evolution, time travel, and metadata inspection using OCI native protocol and Hadoop catalog. | | Pipe-Delimited File Ingestion | Read pipe-delimited (\|) files from OCI Object Storage and register them as external tables. | | Read Excel Files | Read Excel (.xlsx) files using the Spark Excel connector and convert them to Spark DataFrames or CSV. | | Streaming from OCI Streaming Service | Consume messages from OCI Streaming (Kafka-compatible) using Spark Structured Streaming with SASL/OAUTHBearer authentication. | | Streaming from Volume Path | Process CSV files from a workspace volume using one-time micro-batch streaming with Trigger.Once(). |

---

Data Engineering — Transformation

Architectural patterns and pipeline templates for data transformation at scale.

Medallion Architecture

Implements the Bronze → Silver → Gold lakehouse pattern with data quality checks and aggregations. Industry variants available:

| Notebook | Industry | |---|---| | Education | Education analytics pipeline | | Energy | Energy consumption and reporting | | Financial Services | Financial transactions and risk | | Healthcare | Patient records and clinical data | | Hospitality | Hotel bookings and guest analytics | | Insurance | Policy and claims processing | | Manufacturing | Production line and quality data | | Media | Content engagement and subscriptions | | Real Estate | Property listings and transactions | | Retail | Sales, inventory, and customer data | | Telecommunications | Network usage and customer churn | | Transportation | Logistics and fleet tracking |

Delta Liquid Clustering

Demonstrates Delta Lake liquid clustering for automatic query optimization and data layout management. Industry variants available:

| Notebook | Industry | |---|---| | Education | Student performance analytics with ML prediction | | Energy | Smart grid monitoring and anomaly detection | | Financial Services | Transaction analytics and reporting | | Healthcare | Patient data access patterns | | Hospitality | Booking and occupancy analytics | | Insurance | Claims and policy data optimization | | Manufacturing | Production and quality metrics | | Media | Content and engagement data | | Real Estate | Property and transaction data | | Retail | Sales and inventory analytics | | Telecommunications | Network and customer usage data | | Transportation | Fleet and logistics optimization |

Apache Iceberg Uniform Liquid Clustering

Combines Delta UniForm with Apache Iceberg Liquid Clustering for open-format, cross-engine table optimization. Industry variants available:

| Notebook | Industry | |---|---| | Education | Student performance data | | Energy | Grid and sensor data | | Financial Services | Transaction and risk data | | Healthcare | Clinical and patient records | | Hospitality | Booking and revenue data | | Insurance | Policy and claims data | | Manufacturing | Production and IoT data | | Media | Content delivery data | | Real Estate | Property listings data | | Retail | Sales and inventory data | | Telecommunications | Network usage data | | Transportation | Fleet and route data |

Other Transformation Patterns

| Notebook | Description | |---|---| | Slowly Changing Dimensions (SCD Type 2) | Track historical changes to dimension records using SCD Type 2 with Jinja2-templated merge logic. | | Streaming — Energy Delta Liquid Clustering | Real-time smart grid monitoring with streaming Delta tables, anomaly detection, and statistical baselines for energy consumption. | | Streaming — Manufacturing Delta Liquid Clustering | Continuous ingestion and clustering of manufacturing sensor data using Spark Structured Streaming and Delta Lake. |

Cross-Format & External Table Interop

| Sample | Description | |---|---| | ADW External Table on Delta UniForm | Automates recreating an ADW Iceberg external table against the latest UniForm-generated metadata file when a UniForm-enabled Delta table evolves — ADW + Python + a stored procedure that resolves the newest vN.metadata.json. |

Other Utilities

| Sample | Description | |---|---| | DataFrame PII Masking with AI | PySpark utility that detects and masks PII columns using a pluggable PIIChecker abstraction — supports Anthropic Claude (Haiku/Sonnet/Opus) and OCI native models via Spark query_model(). | | Partition-Aware Merge Generator | Helper utility for partition-aware merge operations on Spark DataFrames: PK-based updates, configurable update policies, deletes, and schema evolution — Delta-MERGE-like behaviour without requiring Delta. |

Miscellaneous

| Sample | Description | |---|---| | Working with Table Properties | Manage Spark SQL table properties on a managed Delta table — set, read, overwrite, and remove properties, and store structured JSON metadata as a property value. |

---

AI & Machine Learning

Notebooks covering generative AI, NLP, ML model training, and LLM-powered analytics.

| Notebook | Description | |---|---| | Sentiment Analysis with OCI GenAI | Perform sentiment analysis on text data using OCI Generative AI (Llama model) via the AIDP query_model function. | | OCI Language Service Translation | Translate text using OCI AI Language Service via REST API, demonstrated with a round-trip English ↔ Spanish translation. | | Customer Churn Prediction (GPU) | Train a TensorFlow neural network on GPU for customer churn prediction, including preprocessing, training, and evaluation. | | LLM Model Output Parser | Use an LLM to parse and translate statistical model outputs into business-friendly insights and plain-language summaries. | | Natural Language to SQL (NL2SQL) | Introspect a database schema and generate accurate SQL queries from natural language questions using an LLM, with result summarization. | | Multi-Table NL2SQL with Grouped Analysis | Extend NL2SQL to multi-table scenarios with grouped LLM analysis for complex procurement and supplier-item intelligence. | | Retrieval-Augmented Generation (RAG) | End-to-end RAG pipeline: ingest documents from OCI Object Storage, chunk and embed text, retrieve relevant context, and generate answers with an LLM. | | Movie Recommendation System | Build a collaborative filtering recommendation engine using PySpark ML's ALS algorithm, trained and evaluated on movie rating data. | | Linear Mixed Effects Model | Apply a Linear Mixed Effects Model (LME) with statsmodels and PySpark to analyze student test scores across schools, accounting for fixed and random effects. |

Agent Flows

| Sample | Description | |---|---| | Agent Flow Schedule Trigger | Invoke AIDP agent flows via REST API using OCI request signing, demonstrating programmatic agent orchestration with custom message handling. | | Invoke Agent Flows from APEX | Oracle APEX region plugin that adds a chat UI for AIDP agents, with persistent conversation history, async Oracle AQ-backed response processing, and conversation summarization. | | Invoke Agent Flows from Streamlit | Streamlit chat app for AIDP agents with streaming responses, trace/span visualization, multiple auth modes (API key, security token, resource principal), and OCI Container Instance deployment. |

Visual (No-Code) Agent Flows

End-to-end labs showing the AIDP visual flow canvas authoring experience.

| Sample | Description | |---|---| | Hello World Agent | Minimal conversational agent built on the visual flow canvas — the starting template that grounds answers on the model's training data. | | Entertainment Industry Analyst | Release & performance analyst combining RAG over internal playbooks/policies with strictly-defined parameterized SQL tools for read-only analytics. | | ACME Pet Insurance Customer Support | RAG-based customer support agent answering policy questions from PDF documents in a Knowledge Base. |

Custom Tools

Python tool packages that extend agent flows with user-authored capabilities. Upload the ZIP to a workspace volume and wire it into any agent flow. See the Custom Tools User Guide for the full authoring contract.

| Sample | Description | |---|---| | Hello Tool | The minimal CustomToolBase tool — one class, one parameter, no dependencies. The starting template for new tools. | | Developer Toolkit | Three tools in one package — bash execution, file I/O, and Python subprocess execution — demonstrating multi-tool packages and a shared utils/ module. | | ORDS Database Tool | Query Oracle Autonomous Database via the ORDS REST API with basic auth. Executes SQL, lists tables/views, and describes columns. |

Agent Chat Clients

| Sample | Description | |---|---| | AIDP Chat Client — Python Library | Reusable Python client for AIDP Chat Agent endpoints: streaming & non-streaming responses, API Key + Security Token auth, typed APIs, and a standalone test script for quick endpoint verification. | | AIDP Agent Chat — Web UI | Browser chat UI for any deployed AIDP agent: a Flask proxy that handles OCI request signing plus a single-page HTML frontend. Includes a one-command deploy script for OCI Container Instances. |

Code-First Agent Flows

| Sample | Description | |---|---| | Multi-MCP Chat Agent | Natural-language chat agent that fans out across Oracle Autonomous Database (Select AI MCP), Oracle Analytics Cloud (Logical SQL MCP), and Oracle Integration Cloud (project-scoped MCP). Each integration can be enabled or disabled independently via config. | | ReAct Agent with RAG Tool | Simple code-authored ReAct agent that uses a RAG tool over PDFs stored in an AIDP Knowledge Base — runs as a code-first agent flow with the standard playground/test loop. | | Supply Chain Agent | Multi-agent system for supply-chain operations using OCI Generative AI (Grok-4) over AIDP catalog tables — data generation, table provisioning, and agent configuration walkthrough included. |

---

Shared Utilities

| Notebook | Description | |---|---| | Data Code Generator | Generate realistic multi-table synthetic datasets from a YAML configuration file, with CSV and JSON export support for testing and prototyping. | | Data Quality Checker | Run comprehensive data quality checks including null, uniqueness, range, pattern, foreign key, and AI-powered semantic validation across single and multiple tables. | | OCI Vault Secret Retrieval | Securely retrieve secrets (passwords, API keys, connection strings) from OCI Vault using auto-detected authentication — Resource Principal on AI Data Platform or OCI config file locally. | | AIDP Customer Workbench Usage UI | Browser UI and read-only local proxy for viewing AIDP Workbench workspaces, compute clusters, notebooks, workflows, and cluster libraries from fixture data or live Workbench REST APIs. |

---

Developer Tooling

| Sample | Description | |---|---| | Claude Code Plugins for AIDP | Anthropic Claude Code plugins published by the Oracle AIDP team. Includes the oracle-ai-data-platform-workbench-spark-connectors plugin — 18 model-invokable skills connecting Spark notebooks to Oracle (ALH/ADW/ATP, ExaCS, Fusion, BICC, EPM, Essbase) and external (PostgreSQL, MySQL/HeatWave, SQL Server, Snowflake, ADLS Gen2, S3, OCI Streaming, Object Storage, Iceberg, REST/JDBC, Excel) sources. |

---

Running the Samples

Prerequisites

Before running any sample, ensure you have:

An active Oracle AI Data Platform Workbench environment with a compute cluster.
The required IAM policies configured for the services used (Object Storage, ALH, AI Services, etc.).
Cluster libraries installed from the requirements.txt file included in the relevant sample folder, where applicable.

General Steps

1. Open your AIDP Workbench notebook environment. 2. Clone or import the samples into your workspace. 3. Navigate to the notebook of your choice and open it. 4. Follow the instructions and prerequisites described in the notebook's opening cells. 5. Attach the notebook to a running compute cluster and execute the cells.

MLflow Tracking Server

Several ML samples integrate with MLflow for experiment tracking. Ensure your AIDP environment has an MLflow Tracking Server configured. Refer to the AIDP documentation for setup instructions.

---

Documentation

---

Get Support

If you encounter issues with these samples, please open an issue in this repository. For questions about Oracle AI Data Platform itself, refer to the OCI Support portal.

---

Security

Please consult the security guide for our responsible security vulnerability disclosure process.

---

Contributing

This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide.

---

License

See LICENSE

oracle-ai-data-platform-workbench-databricks-migrator

Summary

Install to Claude Code