The development of the chatbot solution for Donat involved leveraging a variety of technologies to ensure efficient processing, multilingual support, and seamless integration with Donat’s existing infrastructure. Here’s an overview of the key technologies utilized:
- Google Drive for Knowledge Base:
The raw knowledge base resides on Google Drive, comprising documents in various formats such as Google Docs, Sheets, and PDFs. Non-technical personnel have the capability to add new documents to the bot’s knowledge base, facilitating easy knowledge expansion.
- Document Processing and Translation:
Documents undergo processing to extract text, particularly from PDFs, ensuring that only relevant content is utilized. Machine translation, powered by AI, is employed to translate documents into multiple languages, enabling multilingual support. Each language on Donat’s website corresponds to a separate bot, each with its own knowledge base specific to that language.
Documents are divided into smaller text chunks through a process known as chunking. These chunks are then converted into embeddings, allowing for efficient storage and retrieval of information. Weaviate, a vector database, is utilized to store these chunks, facilitating quick access to relevant information.
The entire logic and infrastructure of the project are hosted on Amazon Web Services (AWS), providing scalability, reliability, and security. Serverless compute services, specifically AWS Lambda, enable the seamless execution of tasks such as streaming responses. This is crucial for ensuring responsive performance of the website and the chatbot, particularly during the generation of responses, which may take some time. The gradual display of responses on the website ensures a smooth user experience.
Weaviate, the vector database utilized for storing text embeddings, is hosted on Weaviate Cloud Services. This platform offers robust features for managing and querying vector data, complementing the overall infrastructure of the project.