Finding relevant code in large codebases is a common challenge. Traditional grep-based search falls short when you need semantic understanding. CodeGraph solves this with a multi-modal approach.
Three Pillars of Code Understanding
1. Syntax Indexing
Using tree-sitter, we parse code into Abstract Syntax Trees. This gives us structured understanding of functions, classes, and their relationships. Supports 8+ languages with a unified interface.
2. Semantic Indexing
CodeBERT embeddings capture the meaning of code. Similar functionality written differently will have similar embeddings. We store these in ChromaDB for efficient similarity search.
3. Relation Indexing
Call graphs and import relationships are stored in NetworkX. This enables queries like "what functions call this method?" or "what are the dependencies of this module?"
Incremental Updates
Re-indexing entire codebases on every change is impractical. Our incremental algorithm:
- Detects changed files via filesystem events
- Computes affected symbols
- Updates only relevant index entries
Query Examples
# Semantic search
codegraph search "authentication middleware"
# Find callers
codegraph relations --callers user_login
# Hybrid search
codegraph search "error handling" --type function --lang python
CodeGraph has become an essential tool in my daily development workflow.