NASA Uses Graph Databases and LLMs to Enhance People Analytics
NASA employs graph databases and LLMs to analyze employee data, identify experts, and build teams. This system uses Memgraph, Ollama, and various data sources to extract skills and detect project overlaps, allowing for a RAG-based chatbot and plans to scale the graph.
- Introduction to NASA's People Graph: An initiative using graph databases and LLMs to transform people analytics at NASA.
- Purpose: Identify top experts, form high-performing teams, and plan for future skills.
- Challenge: Traditional relational databases struggle with complex relationships in large organizations like NASA.
- Solution: Graph databases connect people to skills, projects, and career paths, enabling direct queries about expertise and skill gaps.
- Key Technology Stack: Memgraph graph database, Ollama LLM server (on-prem AWS EC2), AWS S3, GQLAlchemy.
- Data Sources: Personnel Data Warehouse, AI Use Case Registry, Team Resumes.
- Skill Extraction: LLMs (Ollama) process resume data to extract skills without manual tagging.
- Project Similarity: Cosine Similarity computed between project descriptions to identify related projects.
- Graph Schema: Labeled property graph with nodes for Employees, Skills, Projects, Organizations, etc., all labeled "Entity" for vector indexing and GraphRAG.
- Applications: Subject Matter Experts Finder, Leadership Reports (workforce analysis), Project Overlap detection.
- Chatbot Interface: A RAG-based chatbot allows natural language queries on the graph.
- RAG Pipeline: Extracts key info from questions, performs "Modified Pivot Search" and "Relevance Expansion" to get context triplets (start node, end node, relationship), which are fed to the LLM (Ollama) with the original question to generate a response.
- Embeddings: Stored directly in Memgraph as node properties with vector search indices.
- Current Scale: ~27K nodes and 230K edges.
- Future Vision: Scale the graph to over 500,000 nodes and millions of edges, improve data quality, automate pipelines, and expand data sources.
- Key Benefit (Memgraph): Cost-effective label property graph solution using Cypher, with Python integration, suitable for large-scale, complex data like NASA's.