
ListenHub
0
5-14Building Knowledge Graphs from Documents with LLM using CocoIndex
Key Concepts:
- CocoIndex: A tool for building and maintaining knowledge graphs with continuous source updates.
- LLM (Large Language Model): Used to extract relationships between concepts in documents.
- Knowledge Graph: A structured representation of knowledge, showing relationships between entities.
Types of Relationships Extracted:
- Subject-Object Relationships: E.g., "CocoIndex supports Incremental Processing."
- Entity Mentions: E.g., "core/basics.mdx" mentions
CocoIndex
andIncremental Processing
.
Prerequisites:
- PostgreSQL (for CocoIndex's incremental processing)
- Neo4j (graph database)
- OpenAI API key (or Ollama for local LLM models)
Data Flow:
- Add Documents as Source: CocoIndex documentation markdown files (
.md
,.mdx
). - Add Data Collectors:
document_node
: Collects documents (e.g.,core/basics.mdx
).entity_relationship
: Collects relationships (e.g., "CocoIndex supports Incremental Processing").entity_mention
: Collects mentions of entities in a document (e.g.,core/basics.mdx
mentionsCocoIndex
).
- Process Each Document and Extract Summary: Use
cocoindex.functions.ExtractByLlm
withgpt-4o
to get document title and summary. - Extract Relationships: Use
cocoindex.functions.ExtractByLlm
to extract relationships (subject, predicate, object) from the document. - Collect Relationships: Collect relationships between subjects and objects and mentions of entities in the document.
- Build Knowledge Graph: Export nodes and relationships to Neo4j.
Building the Knowledge Graph in Neo4j:
- Nodes: Represent entities (e.g., Document, Entity). Each node needs a label and a primary key field.
- Relationships: Connect nodes.
Exporting to Neo4j:
- Configure Neo4j Connection: Specify URI, user, and password.
- Export
Document
Nodes: Export from thedocument_node
collector, usingfilename
as the primary key. - Export
RELATIONSHIP
andEntity
Nodes: DeclareEntity
nodes and then export relationships from theentity_relationship
collector. - Export
entity_mention
: CreatesDocument
nodes andEntity
nodes and connects them withMENTION
relationships.
Code Snippets:
-
Adding a source:
flow_builder.add_source(cocoindex.sources.LocalFile(path="../../docs/docs/core", included_patterns=["*.md", "*.mdx"]))
-
Extracting summary:
doc["summary"] = doc["content"].transform(cocoindex.functions.ExtractByLlm(llm_spec=cocoindex.LlmSpec(api_type=cocoindex.LlmApiType.OPENAI, model="gpt-4o"), output_type=DocumentSummary, instruction="Please summarize the content of the document."))
-
Extracting relationships:
doc["relationships"] = doc["content"].transform(cocoindex.functions.ExtractByLlm(llm_spec=cocoindex.LlmSpec(api_type=cocoindex.LlmApiType.OPENAI, model="gpt-4o"), output_type=list[Relationship], instruction=("Please extract relationships from CocoIndex documents. " "Focus on concepts and ignore examples and code. ")))
-
Exporting Document nodes to Neo4j:
document_node.export("document_node", cocoindex.storages.Neo4j(connection=conn_spec, mapping=cocoindex.storages.Nodes(label="Document")), primary_key_fields=["filename"])
-
Exporting relationships and entities to Neo4j:
entity_relationship.export("entity_relationship", cocoindex.storages.Neo4j(connection=conn_spec, mapping=cocoindex.storages.Relationships(rel_type="RELATIONSHIP", source=cocoindex.storages.NodeFromFields(label="Entity", fields=[cocoindex.storages.TargetFieldMapping(source="subject", target="value")]), target=cocoindex.storages.NodeFromFields(label="Entity", fields=[cocoindex.storages.TargetFieldMapping(source="object", target="value")])), primary_key_fields=["id"])
-
Exporting the
entity_mention
to Neo4j:entity_mention.export("entity_mention", cocoindex.storages.Neo4j(connection=conn_spec, mapping=cocoindex.storages.Relationships(rel_type="MENTION", source=cocoindex.storages.NodesFromFields(label="Document", fields=[cocoindex.storages.TargetFieldMapping("filename")]), target=cocoindex.storages.NodesFromFields(label="Entity", fields=[cocoindex.storages.TargetFieldMapping(source="entity", target="value")]))), primary_key_fields=["id"])
Running the Index:
-
Install dependencies:
pip install -e .
-
Setup and update the index:
python main.py cocoindex setup python main.py cocoindex update
Querying the Knowledge Graph:
You can explore the knowledge graph in Neo4j Browser using Cypher queries.
MATCH p=()-->() RETURN p
``