Independent

Vector Database Instruction

Project Overview

This repository introduces the basic concept of Vector Databases, with examples using local databases like ChromaDB. The project demonstrates how vector databases are used to store and manipulate high-dimensional data efficiently.

In the modern data landscape, where approximately 80% of available information is unstructured (Onna), vector databases play a crucial role in storing and querying this type of data. This repository covers the foundation of working with vector databases, providing hands-on examples and explanations.

Final Outcome

The repository provides an introduction to vector databases, demonstrating how data is stored and queried efficiently. It includes examples of working with a Medium Post Titles dataset and using ChromaDB for vector storage. The project also covers various similarity calculations, including Euclidean Distance, Cosine, and Manhattan distances, to explore how vector similarity is measured.

This project serves as a foundation for understanding vector databases and can be expanded with additional features like more advanced similarity metrics and additional datasets.