In the last post, Fabrice AI: The Technical Journey I explained the journey we went through to building Fabrice AI doing a full circle. I started by using Chat GPT 3 and 3.5. Disappointed with the results, I tried to use the Langchain Framework to build my own AI model on top of it, before coming back to Chat GPT once they started using vector databases and massively improving results with 4o.
Here is the current process for training Fabrice AI:
- The training data (blog posts, Youtube URLs, podcasts URLs, PDF URLs and image URLs) is stored in our WordPress database.
- We extract the data and structure it.
- We provide the structured data to Open AI for training using the Assistants API.
- Open AI then creates a vector store database and stores it.
Here is an example of a piece of structured data. Each piece of content has its own JSON file. We make sure not to exceed the 32,000 tokens limit.
{
“id”: “1”,
“date”: ” “,
“link”:”https://fabricegrinda.com/”,
“title”: {
“rendered”: “What is Fabrice AI?”
},
“Category”: “About Fabrice”,
“featured_media”: “https://fabricegrinda.com/wp-content/uploads/2023/12/About-me.png”,
“other_media”: “”,
“knowledge_type”: “blog”,
“contentUpdated”: “Fabrice AI is a digital representation of Fabrice’s thoughts based on his blog posts and select transcribed podcasts and interviews using ChatGPT.Given that many of the transcriptions are imperfectly transcribed and that the blog is but a limited representation of Fabrice the individual, we apologize of inaccuracies and missing information. Nonetheless, this is a good starting point to get Fabrice’s thoughts on many topics.”
}
The is the current technical implementation:
- The consumer facing website is hosted on AWS Amplify.
- The integration between the public site and Open AI is done through an API layer, which is hosted on AWS as a Python API server.
- We use MongoDB as a log to store all the questions asked by the public, the answers given by Chat GPT, and the URLs of the sources.
- We use various scripts to structure the data from the blog, YouTube, etc. to pass to Open AI for training.
- We use React-Speech Recognition to convert voice inquiries to text.
- We also use Google Analytics to track the website traffic.
It’s important to note that we use two assistants:
- One for answering questions.
- One for getting metadata URLs, the blog URLs that have the original content to display the sources at the bottom of the answers.
What next?
- Speech-to-Text Improvements
Open AI’s Whisper model for speech to text is more accurate than React. It also supports multiple languages out of the box and it’s good at handling mixed language speech, accents, and dialects. As a result, I will most likely move to it in the coming months. That said it’s more complex to set up so it might be a while. You need to handle the model, manage dependencies (e.g., Python, libraries), and ensure you have sufficient hardware for efficient performance. Also, Whisper isn’t designed for direct use in browsers. When building a web app, you need to create a backend service to handle the transcription which adds complexity.
- Fabrice AI Avatar
I want to create a Fabrice AI Avatar that looks and sounds like me that you can have a conversation with. I evaluated D-iD but found it way too expensive for my purposes. Eleven Labs is voice-only. Synthesia is great but does not currently create videos in real time. In the end I decided to use HeyGen given the more appropriate pricing and functionality.
I suspect that at some point Open AI will release its own solution so this work will have been for naught. I am comfortable with that and will switch to the Open AI solution when and if it comes. At this stage the point of this entire exercise is to learn what’s possible with AI and how much work it requires to help me understand the space better.
- Custom Dashboard
Right now, I need to run a MongoDB query to get an extract of the day’s questions and answers. I am building a simple dashboard where I can get extractions and simple statistics on the number of queries per language, the number of speech-to-text requests etc.
- Additional Data Sources
We just uploaded the FJ Labs Portfolio to Fabrice AI. You can now ask whether a company is part of the portfolio. Fabrice AI answers with a short description of the company and a link to its website.
Given the number of personal questions Fabrice AI was getting that it did not have the answers to, I took the time to manually tag every speaker in my 50th Birthday Video to give the content it needed.
Conclusion
With all the work I have done over the past twelve months on all things AI related, there seems to be a clear universal conclusion: the more you wait, the cheaper, easier and better it gets, and the more likely that Open AI will offer it! In the meantime, let me know if you have any questions.