NASA People Graph: AI-powered Workforce Analytics - ListenHub

NASA People Graph: AI-powered Workforce Analytics

ListenHub

0

Apr 29

Frommemgraph

NASA Uses Graph Databases and LLMs to Enhance People Analytics

NASA employs graph databases and LLMs to analyze employee data, identify experts, and build teams. This system uses Memgraph, Ollama, and various data sources to extract skills and detect project overlaps, allowing for a RAG-based chatbot and plans to scale the graph.

Introduction to NASA's People Graph: An initiative using graph databases and LLMs to transform people analytics at NASA.
Purpose: Identify top experts, form high-performing teams, and plan for future skills.
Challenge: Traditional relational databases struggle with complex relationships in large organizations like NASA.
Solution: Graph databases connect people to skills, projects, and career paths, enabling direct queries about expertise and skill gaps.
Key Technology Stack: Memgraph graph database, Ollama LLM server (on-prem AWS EC2), AWS S3, GQLAlchemy.
Data Sources: Personnel Data Warehouse, AI Use Case Registry, Team Resumes.
Skill Extraction: LLMs (Ollama) process resume data to extract skills without manual tagging.
Project Similarity: Cosine Similarity computed between project descriptions to identify related projects.
Graph Schema: Labeled property graph with nodes for Employees, Skills, Projects, Organizations, etc., all labeled "Entity" for vector indexing and GraphRAG.
Applications: Subject Matter Experts Finder, Leadership Reports (workforce analysis), Project Overlap detection.
Chatbot Interface: A RAG-based chatbot allows natural language queries on the graph.
RAG Pipeline: Extracts key info from questions, performs "Modified Pivot Search" and "Relevance Expansion" to get context triplets (start node, end node, relationship), which are fed to the LLM (Ollama) with the original question to generate a response.
Embeddings: Stored directly in Memgraph as node properties with vector search indices.
Current Scale: ~27K nodes and 230K edges.
Future Vision: Scale the graph to over 500,000 nodes and millions of edges, improve data quality, automate pipelines, and expand data sources.
Key Benefit (Memgraph): Cost-effective label property graph solution using Cypher, with Python integration, suitable for large-scale, complex data like NASA's.

Outline

NASA Uses Graph Databases and LLMs to Enhance People Analytics

NASA employs graph databases and LLMs to analyze employee data, identify experts, and build teams. This system uses Memgraph, Ollama, and various data sources to extract skills and detect project overlaps, allowing for a RAG-based chatbot and plans to scale the graph.

Introduction to NASA's People Graph: An initiative using graph databases and LLMs to transform people analytics at NASA.
Purpose: Identify top experts, form high-performing teams, and plan for future skills.
Challenge: Traditional relational databases struggle with complex relationships in large organizations like NASA.
Solution: Graph databases connect people to skills, projects, and career paths, enabling direct queries about expertise and skill gaps.
Key Technology Stack: Memgraph graph database, Ollama LLM server (on-prem AWS EC2), AWS S3, GQLAlchemy.
Data Sources: Personnel Data Warehouse, AI Use Case Registry, Team Resumes.
Skill Extraction: LLMs (Ollama) process resume data to extract skills without manual tagging.
Project Similarity: Cosine Similarity computed between project descriptions to identify related projects.
Graph Schema: Labeled property graph with nodes for Employees, Skills, Projects, Organizations, etc., all labeled "Entity" for vector indexing and GraphRAG.
Applications: Subject Matter Experts Finder, Leadership Reports (workforce analysis), Project Overlap detection.
Chatbot Interface: A RAG-based chatbot allows natural language queries on the graph.
RAG Pipeline: Extracts key info from questions, performs "Modified Pivot Search" and "Relevance Expansion" to get context triplets (start node, end node, relationship), which are fed to the LLM (Ollama) with the original question to generate a response.
Embeddings: Stored directly in Memgraph as node properties with vector search indices.
Current Scale: ~27K nodes and 230K edges.
Future Vision: Scale the graph to over 500,000 nodes and millions of edges, improve data quality, automate pipelines, and expand data sources.
Key Benefit (Memgraph): Cost-effective label property graph solution using Cypher, with Python integration, suitable for large-scale, complex data like NASA's.

Script

Mia: Okay, so let's dive right in. Today we're talking about NASA's People Graph. I gotta be honest, the name alone sounds like something out of a sci-fi movie. What is it exactly? I mean, at a high level, what are they trying to do? Find the right people for the right jobs, build dream teams or something?

Mars: Yeah, pretty much! Think of it like this: imagine a giant spider web, but instead of spiders, it's all the NASA employees. Their skills, the projects they've worked on, even the different departments they're in – everything's connected. So instead of digging through excel sheets and clunky databases, the system can look and say Hey, this Alice here is good at orbital mechanics and has volunteered on Mars rover stuff back in 2019.

Mia: Woah, hold on. You're saying they ditched the good old relational databases? Like, SQL and all that jazz?

Mars: Basically, yeah. In an organization as huge as NASA, relationships get super complicated super fast. Figuring out who on the West Coast knows machine learning well and has worked on climate models? using SQL... It can be like, untangling earbuds in your pocket. Graph databases make finding that answer way easier – almost like a one-click thing.

Mia: Okay, I’m starting to get it. And the secret sauce is mixing that with, like, those giant language AI things that can read everything?

Mars: Exactly! They're using Memgraph for the graph database part, and then Ollama – so, an LLM that they can run themselves in the cloud – to actually read through documents like resumes or project descriptions. The AI can look at a CV and automatically pull out key skills. So no one has to go in and manually tag things like Python or remote sensing by hand.

Mia: Wow, that’s slick! So, instead of someone having to manually update a database, the AI just reads and tags everything on its own?

Mars: Yup. It gets even cooler, they use the AI to analyze how similar their projects are based on the project descriptions. Let’s say this Mars sample return and Lunar ice study share 80% vocabulary. So you can easily see which projects are related.

Mia: I love that. It’s kind of like how Spotify suggests music, right? “Because you listened to A, you might like B.”

Mars: Exactly! The whole thing – employees, skills, projects – all that sits inside this labeled property graph. And they use these things called vector embeddings inside Memgraph, so searching for similar things is super, super fast.

Mia: So, who is actually using this and how? Is it just some dashboard with graphs?

Mars: They built a chatbot interface, based around RAG. So you can type in something like, Who's the best solar physicist at the Jet Propulsion Lab who’s really good with C++ coding? The system then breaks the request into three parts – a beginning, an end, and a connection – fetches data, and uses the LLM to send back a human-friendly response.

Mia: Wow, that’s pretty neat. Kinda like having a research assistant who knows everybody's resume inside and out.

Mars: Exactly! Right now, they're at about 27,000 connections and 230,000 links – which is small for NASA! But they're planning to scale it up to half a million connections, millions of links, pull in more data, and totally automate the process.

Mia: Any big challenges they are trying to overcome?

Mars: Data quality is always the biggest challenge. You have to clean weird, inconsistent information and get rid of fake or look-alike accounts. They would also like to feed in things like meeting notes, papers, and even Slack messages. But privacy is really important so they have to be careful with what they are feeding in!

Mia: Sounds like the future HR wizard! Before we wrap up, what’s the biggest benefit of all this, in your opinion?

Mars: Cost-effective deep analysis. The Memgraph graph system is awesome, plus the Cypher language and Python hooks mean you don't need to be a computer PhD to get some amazing data analytics. It lets you break down workforce data in super meaningful ways.

Mia: That’s awesome. So NASA’s basically turning its talent pool into a living, breathing knowledge graph. Thanks for walking us through it!

Mars: My pleasure. Always fun to geek out on people graphs with you.