Langfuse Integration with MariaDB Vector Store Documentation
Overview
This document describes a Python script (main.py) that integrates Langfuse for tracing with a MariaDB vector store using LangChain and Sentence Transformers. The script demonstrates how to:
- Initialize a Sentence Transformer model for embeddings.
- Set up Langfuse for tracing application logic.
- Configure a MariaDB vector store.
- Add documents to the vector store and perform a similarity search.
- Log results to Langfuse for observability.
Architecture Flow
Prerequisites
To run the script, ensure the following are installed and configured:
- Python 3.8+
- Dependencies:
langfuselangchain-mariadblangchain-communitysentence-transformerspython-dotenv
- MariaDB: A running MariaDB instance with a database named
langchain. - Environment Variables:
- Create a
.envfile in the project root with the following:LANGFUSE_PUBLIC_KEY=<your-langfuse-public-key> LANGFUSE_SECRET_KEY=<your-langfuse-secret-key> MARIADB_USER=<your-mariadb-username> MARIADB_PASSWORD=<your-mariadb-password>
- Create a
- Langfuse Account: Sign up at Langfuse to obtain
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEY.
Installation
-
Clone the repository or copy the script to your local environment.
-
Install the required Python packages:
pip install langfuse langchain-mariadb langchain-community sentence-transformers python-dotenv -
Set up the
.envfile with your credentials. -
Ensure MariaDB is running and accessible at
localhostwith the databaselangchaincreated.
Code Structure
The script (main.py) is structured as follows:
1. Importing Dependencies
import os
from langfuse import get_client
from langchain_mariadb import MariaDBStore
from sentence_transformers import SentenceTransformer
from langchain_core.documents import Document
from langchain_community.embeddings import HuggingFaceEmbeddings
from dotenv import load_dotenvosanddotenv: Load environment variables.langfuse: Provides the Langfuse SDK for tracing.langchain_mariadb: MariaDB vector store integration with LangChain.sentence_transformersandlangchain_community: Generate embeddings using theall-MiniLM-L6-v2model.langchain_core.documents: Defines theDocumentclass for text storage.
2. Loading Environment Variables
load_dotenv()Loads environment variables from the .env file for secure access to credentials.
3. Initializing the Sentence Transformer Model
model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")- Uses the
all-MiniLM-L6-v2model to generate 384-dimensional embeddings for text documents. - The model is lightweight and optimized for semantic text similarity.
4. Setting Up Langfuse
os.environ["LANGFUSE_PUBLIC_KEY"] = os.getenv("LANGFUSE_PUBLIC_KEY")
os.environ["LANGFUSE_SECRET_KEY"] = os.getenv("LANGFUSE_SECRET_KEY")
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com"
langfuse = get_client()
if langfuse.auth_check():
print("Langfuse client is authenticated and ready!")
else:
print("Authentication failed. Check your credentials.")- Configures Langfuse with public and secret keys from environment variables.
- Sets the Langfuse host to the cloud instance.
- Verifies authentication with Langfuse.
5. Configuring the MariaDB Vector Store
url = f"mariadb+mariadbconnector://{os.getenv('MARIADB_USER')}:{os.getenv('MARIADB_PASSWORD')}@localhost/langchain"
vectorstore = MariaDBStore(
embeddings=model,
embedding_length=384,
datasource=url,
collection_name="my_docs",
)- Constructs a MariaDB connection string using environment variables.
- Initializes a
MariaDBStoreinstance with:- The Sentence Transformer model for embeddings.
- Embedding length of 384 (specific to
all-MiniLM-L6-v2). - Connection to the
langchaindatabase. - A collection named
my_docsfor storing documents.
6. Application Logic with Langfuse Tracing
with langfuse.start_as_current_span(name="mariadb-trace") as span:
vectorstore.add_documents(
[
Document(page_content="The sun is a star."),
Document(page_content="The moon is a natural satellite.")
]
)
results = vectorstore.similarity_search("Tell me about celestial bodies.")
span.update_trace(
metadata={"query": "Tell me about celestial bodies."}
)
print(f"Search results: {results}")- Starts a Langfuse trace named
mariadb-trace. - Adds two sample documents to the vector store.
- Performs a similarity search with the query “Tell me about celestial bodies.”
- Logs the query metadata to the Langfuse trace.
- Prints the search results.
7. Flushing Langfuse Data
langfuse.flush()Ensures all trace data is sent to the Langfuse server.
Usage
-
Ensure MariaDB is running and the
langchaindatabase exists. -
Populate the
.envfile with your Langfuse and MariaDB credentials. -
Run the script:
python main.py -
Expected output:
- Confirmation of Langfuse authentication.
- Search results from the MariaDB vector store, e.g.:
Langfuse client is authenticated and ready! Search results: [Document(page_content='The sun is a star.'), Document(page_content='The moon is a natural satellite.')]
-
Check the Langfuse dashboard (
https://cloud.langfuse.com) for trace details under themariadb-tracespan.
Notes
- Embedding Model: The
all-MiniLM-L6-v2model is used for its balance of performance and efficiency. Other models can be used by changing themodel_nameparameter, but ensure theembedding_lengthmatches the model’s output dimension. - MariaDB: Ensure the database user has appropriate permissions to create and modify tables in the
langchaindatabase. - Langfuse: Traces are logged to the Langfuse cloud instance. Ensure your credentials are valid to avoid authentication errors.
- Error Handling: The script includes basic authentication checks for Langfuse. Additional error handling (e.g., for MariaDB connection failures) can be added as needed.
Troubleshooting
- Langfuse Authentication Failure:
- Verify
LANGFUSE_PUBLIC_KEYandLANGFUSE_SECRET_KEYin the.envfile. - Ensure the Langfuse host URL is correct.
- Verify
- MariaDB Connection Issues:
- Check that MariaDB is running and accessible at
localhost. - Confirm the
MARIADB_USERandMARIADB_PASSWORDare correct. - Ensure the
langchaindatabase exists.
- Check that MariaDB is running and accessible at
- No Search Results:
- Verify that documents were added successfully to the vector store.
- Ensure the query is relevant to the stored documents.