Orihime
<!-- mcp-name: io.github.srinivasan-sundaresan95/orihime -->
   
A cross-repository code knowledge graph for Java/Kotlin/JavaScript/TypeScript codebases. Orihime indexes your source code into an embedded KuzuDB graph database using tree-sitter and exposes the graph through an MCP server (for AI assistants), a local web UI, and a CLI.
Mythology: Orihime (ηΉε§«) is Vega β the weaving princess who weaves the fabric of the cosmos. She weaves connections. The tool that weaves your codebase into a single graph.
---
What It Does
- Call graph across repositories β who calls what, across service boundaries, including REST calls resolved to the endpoint they target
- Cross-repo taint analysis β track user-controlled data from HTTP/Kafka/JMS entry points through the call graph to dangerous sinks (SQL injection, path traversal, XXE, deserialization, SSRF, log injection, β¦)
- Security reports β OWASP Top 10, CWE, PCI DSS, STIG frameworks; second-order injection detection; custom sources/sinks via YAML
- Entry-point reachability filtering β suppress false positives from dead code; only surface findings reachable from real entry points (HTTP handlers,
@KafkaListener,@Scheduled,@JmsListener,@RabbitListener) - Complexity hints β static O(nΒ²) loop detection, N+1 JPA risk, unbounded queries, recursive calls β no profiler needed
- Performance correlation β ingest Gatling/JMeter load test results; correlate with the call graph to find confirmed hotspots and Little's Law capacity ceilings per endpoint
- License compliance β scan Maven/Gradle dependencies against SPDX identifiers; flag GPL/AGPL/LGPL in commercial projects
- Incremental re-index β git blob-hash-based skip; only changed files are re-parsed on subsequent runs
- Multi-language β Java, Kotlin, JavaScript, TypeScript (Next.js, Express, React)
---
Quick Start β AI-first (Claude Code)
The primary way to use Orihime is through an AI assistant via MCP. You index once, then ask questions in natural language β no Cypher, no grep, no reading source files.
1. Install
git clone https://github.com/srinivasan-sundaresan95/orihime.git
cd orihime
pip install -e .
2. Register with Claude Code (one-time setup)
python -m orihime register # writes MCP server entry to ~/.claude/settings.json
python -m orihime install-skills # copies Claude Code skills to ~/.claude/skills/
Restart Claude Code. The orihime MCP tools and skills (/orihime-call-flow, /orihime-security-audit, /orihime-perf-analysis, /orihime-change-impact) are now active.
3. Index your repositories
python -m orihime index --repo /path/to/your/service-a --name service-a
python -m orihime index --repo /path/to/your/service-b --name service-b
4. Ask questions
Trace the call flow for GET /api/orders in service-a
Find SQL injection risks in service-b
What breaks if I change OrderService.processPayment?
Which endpoints are approaching saturation?
No source file reads. No grep. Claude uses the graph directly β typically 5β8 tool calls vs 30+ for source-only analysis.
CLI alternative: All operations above are also available as Python commands (
python -m orihime index,python -m orihime ui, etc.) if you prefer working outside an AI assistant. See CLI Reference below.
---
Feature Comparison
| Capability | Orihime | GitNexus | SonarQube Community | SonarQube Developer | SonarQube Enterprise | |---|---|---|---|---|---| | Cross-repo call graph | β | β | β | β | β | | REST endpoint resolution | β | β | β | β | β | | MCP integration (AI assistants) | β | β | βΒΉ | βΒΉ | βΒΉ | | Claude Code hooks + skills | β | β | β | β | β | | Cross-file taint (SAST / injection) | β | β | β | β | β | | Second-order injection | β | β | β | β | β | | Entry-point reachability filter | β | β | β | β | β | | Custom sources/sinks (YAML) | β | β | β | β | βΒ² | | OWASP/CWE/PCI/STIG compliance reports | β | β | β | β | β | | Argument-level taint (value-flow) | β | β | β | β | β | | Complexity hints (O(nΒ²), N+1) | β | β | partial | partial | partial | | I/O fan-out + serial/parallel analysis | β | β | β | β | β | | Perf ingestion + capacity model | β | β | β | β | β | | Cross-service cascade risk | β | β | β | β | β | | License compliance | β | β | β | β | βΒ³ | | Embedded DB (no server daemon) | β | β | β | β | β | | Indexes Java / Kotlin | β | β | β | β | β | | Indexes JS / TS | β | β | β | β | β | | License | MIT | PolyForm NC | LGPL | Commercial | Commercial |
ΒΉ Via the official sonarqube-mcp-server (SonarSource, production-ready). Works with all SonarQube editions. Β² Custom taint sources/sinks require the Advanced Security add-on (Enterprise+). Β³ License compliance (SBOM + policy enforcement) requires the Advanced Security add-on (Enterprise+). GitNexus (PolyForm Non-Commercial) provides cross-repo call graphs and MCP integration across 14 languages including Java and Kotlin. It does not cover SAST, perf analysis, or compliance reporting.
---
MCP Tools Reference
Call Graph
| Tool | Description | |---|---| | find_callers(method_fqn) | All methods that call the given method | | find_callees(method_fqn) | All methods called by the given method | | blast_radius(method_fqn, max_depth) | Transitive set of callers up to N hops | | find_endpoint_callers(http_method, path_pattern) | Trace back from an HTTP endpoint to its callers | | find_implementations(interface_fqn) | All classes implementing an interface | | find_superclasses(class_fqn, max_depth) | Inheritance chain | | find_external_calls(repo_name) | All calls to methods outside the indexed repo |
Discovery
| Tool | Description | |---|---| | search_symbol(query) | Full-text search across class/method FQNs | | get_file_location(fqn) | File path and line number for any class or method | | list_repos() | All indexed repositories | | list_branches(repo_name) | All indexed branches for a repo | | list_endpoints(repo_name) | All HTTP endpoints in a repo | | list_unresolved_calls(repo_name) | REST calls that couldn't be matched to an endpoint | | find_repo_dependencies(repo_name) | Cross-service DEPENDS_ON edges |
ORM / JPA
| Tool | Description | |---|---| | list_entity_relations(repo_name) | All JPA entity relationships β also used in design review (Phase 1.5) | | find_eager_fetches(repo_name) | EAGER-fetched collections (N+1 risk) |
Security (SAST)
| Tool | Description | |---|---| | find_taint_sinks(repo_name) | All taint sinks reachable in the call graph | | find_taint_flows(repo_name) | Value-flow taint: argument β parameter across CALLS edges | | find_cross_service_taint(repo_name, max_depth) | Taint that crosses service boundaries via REST | | find_second_order_injection(repo_name) | Taint stored to DB then re-read and used as sink | | find_entry_points(repo_name) | All HTTP/Kafka/Scheduled/JMS/RabbitMQ entry points | | find_reachable_sinks(repo_name, show_all) | Taint sinks filtered to those reachable from entry points only | | generate_security_report(repo_name, framework) | Report in OWASP / CWE / PCI / STIG format | | list_security_config() | Show active sources, sinks, and sanitizers from YAML config |
Complexity & Performance
| Tool | Description | |---|---| | find_complexity_hints(repo_name, min_severity) | Methods flagged with O(nΒ²), N+1, unbounded-query, recursive | | ingest_perf_results(repo_name, file_path) | Load Gatling simulation.log, JMeter XML, or JSON perf data | | find_hotspots(repo_name) | Complexity hints Γ p99 latency, sorted by risk score | | estimate_capacity(repo_name) | Little's Law capacity per endpoint; flags near-saturation | | find_cascade_risk(repo_name) | Cross-service cascade: upstream endpoints limited by downstream saturation |
License Compliance
| Tool | Description | |---|---| | find_license_violations(repo_name, allowed, skip_lookup) | Flag GPL/AGPL/LGPL dependencies via Maven Central |
Index
| Tool | Description | |---|---| | index_repo_tool(repo_path, repo_name) | Trigger an index from within the MCP session |
---
CLI Reference
All operations are also accessible directly without an AI assistant:
python -m orihime index --repo PATH --name NAME [--db PATH] [--force] [--branch NAME]
python -m orihime ui [--port 7700] [--db PATH]
python -m orihime serve
python -m orihime serve-sse [--port 7702] [--db PATH]
python -m orihime resolve [--db PATH]
python -m orihime write-server [--port 7701] [--db PATH]
python -m orihime register [--db PATH] [--python PATH]
python -m orihime install-skills
| Command | Description | |---|---| | index | Parse a repository and write its graph into KuzuDB | | ui | Start the local web UI on port 7700 | | serve | Start the MCP server on stdio (for Claude Code, Claude Desktop, any MCP client) | | serve-sse | Start the MCP server with SSE transport (for CI runners and remote clients) | | resolve | Match RestCall URL patterns against Endpoints across all indexed repos | | write-server | Start the write-serialization server for team/server deployments | | register | Write the Orihime MCP server entry to ~/.claude/settings.json | | install-skills | Copy bundled skills to the target AI assistant's config dir (--agent claude\|cursor\|codex\|copilot\|all) |
---
Web UI
http://localhost:7700
| Page | Description | |---|---| | / | Call graph explorer: search methods, trace callers/callees, visualize CALLS graph | | /findings | Security + complexity findings table β filter by OWASP category, severity, file | | /api/β¦ | JSON endpoints backing the UI (also usable directly) |
---
Configuration
Environment Variables
| Variable | Default | Description | |---|---|---| | ORIHIME_DB_PATH | ~/.orihime/orihime.db | Path to KuzuDB database directory | | ORIHIME_SERVER_URL | _(unset)_ | URL of the write-serialization server (team mode) |
Custom Sources and Sinks
Create ~/.orihime/security_config.yaml (or set ORIHIME_SECURITY_CONFIG):
sources:
- method_pattern: ".*getCustomUserInput"
description: "Custom input source"
sinks:
- method_pattern: ".*legacyExec"
sink_type: "COMMAND_INJECTION"
description: "Legacy shell executor"
sanitizers:
- method_pattern: ".*sanitizeForLegacy"
The built-in config covers HttpServletRequest, @RequestParam, @PathVariable, @RequestBody, JDBC execute*, JPA native queries, Runtime.exec, ProcessBuilder, XML parsers, ObjectInputStream, Files.get, Paths.get, new URL, logging calls, and more.
---
Documentation
| Doc | Description | |---|---| | MCP Server | All MCP tools with parameters and examples | | Extractors | How Java/Kotlin/JS/TS are parsed; ExtractResult schema | | Security Config | Custom sources, sinks, sanitizers β YAML reference | | CI Integration | GitHub Actions PR review workflow setup | | Docker | Docker Compose setup for server deployments | | Adding a Language | How to add a new language extractor | | Cross-Repo Resolution | How REST calls are matched to endpoints across repos |
---
Team / Server Mode
KuzuDB has a single-writer constraint. In team deployments where multiple developers re-index simultaneously, run the write-serialization server:
# On the shared server β owns the KuzuDB connection
python -m orihime write-server --port 7701 --db /shared/orihime.db
# Each developer's indexer sends writes to the server
ORIHIME_SERVER_URL=http://server:7701 python -m orihime index --repo /path --name my-service
Developers running locally without ORIHIME_SERVER_URL open KuzuDB directly as always. The web UI and MCP server always read directly from KuzuDB (reads do not go through the write server).
---
Architecture
Source files
β
βΌ tree-sitter (Java, Kotlin, JS, TS)
ParseResult (plain Python dicts, picklable)
β
βΌ ProcessPoolExecutor (parallel parse workers)
Phase 2: KuzuDB writes (batched by table, 500-edge transactions)
β
βΌ
KuzuDB embedded graph ββββββββββββββββββββββββββββββββ
β β
βββ MCP server (FastMCP, stdio) β
βββ Web UI (Starlette, port 7700) β
βββ Write server (FastAPI, port 7701, team mode) βββ
Graph schema (SCHEMA_VERSION 10):
| Node | Key fields | |---|---| | Repo | id, name, root_path | | File | path, language, blob_hash, branch_name | | Class | fqn, annotations, is_interface | | Method | fqn, line_start, annotations, is_entry_point, complexity_hint | | Endpoint | http_method, path, path_regex | | RestCall | http_method, url_pattern | | EntityRelation | source_class, target_class, fetch_type, relation_type | | PerfSample | endpoint_fqn, p50_ms, p99_ms, rps, source | | CapacityEstimate | endpoint_fqn, saturation_rps, ceiling_concurrency, risk_level |
| Relationship | Description | |---|---| | CALLS | Method β Method; carries callee_name, caller_arg_pos, callee_param_pos | | CALLS_REST | Method β Endpoint (resolved cross-service call) | | UNRESOLVED_CALL | Method β RestCall (not yet resolved) | | CONTAINS_CLASS | File β Class | | CONTAINS_METHOD | Class β Method | | EXPOSES | Repo β Endpoint | | DEPENDS_ON | Repo β Repo (cross-service dependency) | | EXTENDS | Class β Class | | IMPLEMENTS | Class β Class | | HAS_RELATION | Class β EntityRelation | | OBSERVED_AT | Method β PerfSample |
---
Performance
Query performance (graph DB)
Benchmarked on an 845-file Java/Kotlin service:
| Operation | Time | |---|---| | Cold index | ~67s | | Incremental re-index (no changes) | ~34s | | find_callers | <5ms | | blast_radius (depth 3) | <15ms | | find_taint_sinks (full repo) | <25ms |
Batch write speedup vs naive per-row writes: 12Γ.
---
AI assistant benchmark β tracing a single call flow
Java/Kotlin codebase (845 + 224 files, measured)
Benchmarked on a 845-file Kotlin service and a 224-file Java service, tracing one controller endpoint through service β repositories β upstream APIs. GitNexus v1.6.3, Orihime v1.9, and a grep+source-read baseline were all measured on the same codebase on the same hardware (WSL2/Ubuntu, Intel i7, 2026-04-30).
| Approach | Cold index | Query latency | Avg tokens/query | Files read | |---|---|---|---|---| | Baseline β Claude reads source files directly | β | ~4β5 min | ~14,000 | 27 | | GitNexus v1.6.3 | 51.4s | 2β10sβ΄ | ~1,490 | 0 | | Orihime v1.9 | 66.6s | 3β22ms | ~683 | 0 |
Orihime vs baseline: 95% fewer tokens Β· 200β1,400Γ faster queries Orihime vs GitNexus: 2.2Γ fewer tokens Β· 200β1,400Γ faster queries Β· MCP-native
The 7 Orihime tool calls produced ~80% of the structural picture (full controllerβserviceβrepoβupstream chain, 27 test methods surfaced, resilience wiring discovered automatically). The remaining ~20% β upstream API URLs, auth headers, branch-level control flow β requires targeted source reads, scoped to ~5 specific files rather than 27.
GitNexus's cold index is ~1.3Γ faster on NTFS (Node.js parse throughput advantage). On native Linux this gap narrows to near parity.
β΄ GitNexus query latency is dominated by live GitHub API round trips (1β3 per query Γ 500β2,000ms each, rate-limit dependent). Blast radius returned results in the wrong direction (upstream imports rather than downstream dependents).
---
License
MIT






