Rules to Better AI Development - 11 Rules
Want to revolutionize your business with AI? Check SSW's Artificial Intelligence and Machine Learning consulting page.
The advent of GPT and LLMs have sent many industries for a loop. If you've been automating tasks with ChatGPT, how can you share the efficiency with others?
What is a custom GPT?
OpenAI's standard ChatGPT is pretty good at a lot of things, but there are some limitations. Creating a custom GPT means tailoring it for a specific purpose, with custom training data and system prompting. It turns ChatGPT into a ready-made assistant.
If you frequently use the same prompt or series of prompts, it is valuable to make a GPT that knows those instructions permanently.
There are 3 areas where a custom GPT can overcome the limitations of standard ChatGPT:
Retrieval Augmented Generation (RAG)
RAG is the term used to describe additional data your model can access that other models have not (or cannot). Perhaps this is the IP of your company, or simply more up-to-date information on a given topic. If your model has a richer or more refined data set than the competition, it can perform better.
Instructions (System Prompt)
In a GPT you have the ability to define a set of initial instructions. That means you can provide a great initial prompt so users have a high quality prompt even when their prompting skills are low. That means if you're a prompt wizard, you will get better responses than others.
Custom actions
A huge area for innovation is being able to connect your GPT model to your own API, allowing you to take both the user input and perform additional logic before returning a response to the user. Some examples are executing code to test its validity, or looking up a weather forecast for the user's location before suggesting activities that day.
GPT Stores
Stores such as the OpenAI GPT Store and Bind AI let you quickly launch a custom GPT model and make it available (and monetizable) to the world. You can monetize your GPT if it gains enough traction:
✅ Pros
- Fast way to get your custom GPT model live
- Easily test your model's popularity and iterate on market feedback
- Minimal/no infrastructure or maintenance concerns
❌ Cons
- May be difficult to differentiate your model from everybody else's
- Revenue-sharing model may be disadvantageous
Alternative Solution - Bespoke product/service
Building a custom product or service (not on the GPT store) is great if you have the time, energy, and know-how. It can help springboard your startup into the next market unicorn, but requires a much larger time (and dollar) commitment.
✅ Pros
- Complete control over your product (UI, behaviour, pricing, etc.)
- Increased branding and marketability options
- Can become your MVP PaaS/SaaS offering at V1
❌ Cons
- Reliant on SEO to be discovered
- Product $$$ - typically much more expensive to get a V1 out the door
- Infrastructure $$$ - you pay for hosting and maintenance
Takeaways
AI is truly a disruptive technology. There will be many industries that rise and fall on the back of ideas from the public. Be innovative and creative with ChatGPT! Then be sure to come back and give this rule a thumbs up 🙂
GPT is an awesome product that can do a lot out-of-the-box. However, sometimes that out-of-the-box model doesn't do what you need it to do.
In that case, you need to provide the model with more training data, which can be done in a couple of ways.
Usually, for common scenarios GPT will already be adequate, but for more complex or highly specific use cases it will not have the required training to output what you need.
1. System Prompt
The system prompt is a prompt that is sent along with every request to the API, and is used to tell the model how it should behave.
Using the system prompt is the easiest way to provide additional data to GPT, but there are also some downsides to this approach.
✅ Benefits
- Easy to implement
- No extra setup cost
- Data can be easily changed or removed
❌ Disadvantages
- The system prompt counts towards total token count - not suitable for large amounts of data
- Large system prompts can limit the amount of tokens available for questions and responses
- Adds extra cost to each request
- Potential for inconsistency depending on what data is sent
2. Fine Tuning
OpenAI provides a way for you to train new data into the model so that it is always available, without having to provide it with each request.
For example, if you want to build an app that outputs SSW rules based on a title, the untrained model probably won't know what SSW Rules are so you need to train it.
✅ Benefits
- Suitable for larger amounts of data
- No extra cost per request as trained data lives on the server
- Consistent as trained data is available to all requests
❌ Disadvantages
- Harder to implement
- Extra setup cost to fine tune the model
- Model needs to be fine tuned again to change or remove data
- Fine tuning may not be available for every model
When you're building a custom AI application using a GPT API you'll probably want the model to respond in a way that fits your application or company. You can achieve this using the system prompt.
What is the system prompt?
Requests to and from a GPT API generally have 3 types of messages, also known as roles or prompts:
1. User
User messages are any messages that your application has sent to the model.
2. Assistant
Assistant messages are any messages that the model has sent back to your application.
3. System
The system prompt is sent with every request to the API and instructs the model how it should respond to each request.
When we don't set a system prompt the user can tell the model to act however they would like it to:
Note: Depending on the model you're using, you may need to be more firm with your system prompt for the model to listen. Test your prompt using OpenAI's Playground before deploying.
For more information on system prompts, see OpenAI's documentation, or use their playground to start testing your own!
There's lots of awesome AI tools being released, but combining these can become very hard as an application scales.
Semantic Kernel can solve this problem by orchestrating all our AI services for us.What is Semantic Kernel?
Semantic Kernel is an open source SDK developed by Microsoft for their Copilot range of AI tools.
It acts as an orchestration layer between an application and any AI services it may consume, such as the OpenAI API or Azure OpenAI, removing the need to write boilerplate code to use AI.Microsoft - What is Semantic Kernel?
Semantic Kernel - GitHub RepoWhy use Semantic Kernel?
Semantic Kernel offers many benefits over manually setting up your AI services.
-
Common AI abstractions
- Resistant to API changes
- Services can be easily swapped (i.e. from Azure OpenAI to OpenAI API or vice versa)
- Faster development time
- Easier maintenance
Using Semantic Kernel, it's easy to set up a basic console chat bot in under 15 lines of code!
using Microsoft.SemanticKernel; const string endpoint = Environment.GetEnvironmentVariable("AZUREOPENAI_ENDPOINT")!; const string key = Environment.GetEnvironmentVariable("AZUREOPENAI_API_KEY")!; const string model = "GPT35Turbo"; var kernel = Kernel.Builder .WithAzureChatCompletionService(model, endpoint, key) .Build(); while (true) { Console.WriteLine("Question: "); Console.WriteLine(await kernel.InvokeSemanticFunctionAsync(Console.ReadLine()!, maxTokens: 2000)); Console.WriteLine(); }
For a more in depth walkthrough, please see Stephen Toub's article.
-
When using Azure AI services, you often choose between Small Language Models (SLMs) and powerful cloud-based Large Language Models (LLMs), like Azure OpenAI. While Azure OpenAI offer significant capabilities, they can also be expensive. In many cases, SLMs like Phi-3, can perform just as well for certain tasks, making them a more cost-effective solution. Evaluating the performance of SLMs against Azure OpenAI services is essential for balancing cost and performance.
A startup builds a simple FAQ chatbot that answers repetitive customer service questions like “What are your business hours?” or “How do I reset my password?” They choose to implement Azure OpenAI services, leading to high operational costs. An SLM could have provided the same answers without the extra expense.
Figure: Bad example - Using Azure OpenAI services for simple FAQ tasks incurs high costs, while an SLM would be more cost-effective
A financial services company needs to develop a chatbot to guide customers through complex mortgage applications, requiring the chatbot to understand intricate details and provide personalized advice. After evaluating both, they use Azure OpenAI GPT-4o due to its better handling of complex queries and personalized responses, which an SLM could not manage effectively.
Figure: Good example - Choosing Azure OpenAI GPT-4o for complex tasks after evaluation provides better customer service and justifies the higher cost
Why evaluate SLMs?
Cost considerations: Azure OpenAI services, such as GPT-4o, charge per usage, which can quickly add up. On the other hand, SLMs, which can be deployed locally or in a more cost-efficient environment, may offer similar results for less complex tasks, reducing overall costs
Performance needs: Not every task requires the full power of a cloud-based LLM. Tasks like text classification, keyword extraction, or template-based responses can often be handled just as well by an SLM, saving both on compute resources and cost
Model control: Using an SLM, particularly if it is deployed locally, offers more control over the model’s behavior, updates, and fine-tuning. This can be valuable for applications where privacy, security, or specific customizations are required
How to evaluate SLMs against Azure OpenAI services
Set performance benchmarks: Run both the SLM and Azure OpenAI services on the same dataset or task. Measure their performance in terms of accuracy, response quality, latency, and resource consumption
Compare output quality: Test how well each model responds to different types of queries, from simple to complex. While Azure’s LLMs might excel at complex, open-ended tasks, an SLM may be sufficient for simpler, well-defined tasks
Consider deployment environment: Evaluate whether the SLM can be easily integrated into your existing Azure infrastructure. Consider factors like memory and CPU requirements, latency, and whether an SLM can match the scalability offered by Azure’s cloud services
Estimate long-term costs: Calculate the ongoing costs of using Azure’s LLMs, factoring in API fees and compute resources. Then, compare these costs with the deployment and maintenance costs of an SLM, especially for high-volume applications. Long-term savings can be substantial when using SLMs for tasks where full LLM power is unnecessary
When to stick with Azure AI’s cloud LLMs
- For complex tasks that require deep understanding, creativity, or nuanced language generation, Azure OpenAI service, like GPT-4o, may still be the best choice
- Cloud-based LLMs offer ease of scalability and quick integration with Azure services, making them ideal for projects that need high availability or require rapid deployment without complex infrastructure management
By evaluating SLMs against Azure OpenAI services, you can make informed decisions that balance performance with cost, ensuring your AI deployments are both efficient and economical.
When integrating Azure AI's language models (LLMs) into your application, it’s important to ensure that the responses generated by the LLM are reliable and consistent. However, LLMs are non-deterministic, meaning the same prompt may not always generate the exact same response. This can introduce challenges in maintaining the quality of outputs in production environments. Writing integration tests for your most common LLM prompts helps you identify when model changes or updates could impact your application’s performance.
:::::: good
:::Why you need integration tests for LLM prompts
Ensure consistency: Integration tests allow you to check if the responses for your most critical prompts stay within an acceptable range of variation. Without these tests, you risk introducing variability that could negatively affect user experience or critical business logic
Detect regressions early: As Azure AI models evolve and get updated, prompt behavior may change. By running tests regularly, you can catch regressions that result from model updates or changes in prompt design
Measure prompt quality: Integration tests help you evaluate the quality of your prompts over time by establishing benchmarks for acceptable responses. You can track if the output still meets your defined criteria
Test edge cases: Prompts can behave unpredictably with edge case inputs. By testing common and edge case scenarios, you can ensure your AI model handles these situations gracefully
Best practices for writing LLM integration tests
Identify critical prompts: Focus on writing tests for the most frequently used or mission-critical prompts in your application
Set output expectations: Define a range of acceptable output variations for your test cases. This might include specific keywords, response length, or adherence to format requirements
Automate testing: Use continuous integration (CI) pipelines to automatically run your LLM integration tests after each deployment or model update
Log outputs: Log the outputs from your tests to detect subtle changes over time. This can help identify patterns in model behavior and flag potential issues before they become problematic
A chatbot is a computer program that uses artificial intelligence to engage in text or voice conversations with users, often to answer questions, provide assistance, or automate tasks. In the age of generative AI, good chatbots have become a necessary part of the user experience.
Choosing the right chatbot service for your website can be a challenging task. With so many options available it's essential to find the one that best fits your needs and provides a good experience for your users. But what distinguishes a good chatbot from a great one? Here are some factors to consider.
Factors to consider
Development Effort/Cost
- Custom Built vs 3rd Party Service: Custom built provides more control but incurs high development effort & cost - usually 3rd party solutions are cheaper up front
- Pre-built/Drag-and-Drop Builders: Simplifies creation without coding
- Documentation & Support: Bad documentation can make a simple product hard to use - incurring more costs
Performance
- Responses: Smooth and natural responses that answer questions while understanding context
- Visual Design: Aligns with brand aesthetics
- Content Tailoring: Adapts responses to fit brand voice
Research and Training
- API Support: API integration if you might want to use it in other applications
- Data Syncing: How often does it refresh it data from your website?
Scalability
- Traffic Management: Handles varying user traffic levels
- Data Storage: Manages increasing user data
- Knowledge Base There is usually a limit in 3rd party chatbots e.g. Chatbase provides you 11M characters, which roughly equates to ~3500 pages of text
Handling Curveballs
- Adaptive Responses: Adjusts to unexpected user inputs
- Feedback Loop: Improves from past interactions
- Human Agent Referral: Transfers smoothly to a human if needed
- Response Filtering: Is not tricked by misleading questions
Comparing platforms
The first decision is to choose between using a 3rd party chatbot service (e.g. ChatBase or Botpress) vs developing your own from scratch using a large language model API (e.g. OpenAI API).
Factor Directly from an API (e.g. OpenAI) 3rd Party Development effort and cost Very High Low Control Very High Moderate Cost to Train Very Low Very Low Knowledge Base Limits Unlimited Limited but Sufficient Cost per Message Moderate High Before delving deeper into the comparison it would help to first understand the steps involved in building chatbots using either technology.
Using a 3rd Party service
After creating your account and starting a new project, you should:
- Choose the best large language model – e.g. in 2023 you'd choose GPT-4
- Craft a pointed prompt to give it instructions on how to respond to the user. For e.g. you can ask it to share URLs to your web pages when appropriate
- Train the bot by providing links to your web pages or by uploading docs
- Configure the chatbot for features such as a greeting msg, company logo, chat bubble colours, etc.
- Embed an iframe or javascript code provided by the service on your website
Creating a chatbot using an API (e.g. OpenAI API)
The following provides a very high level description of creating a chatbot from scratch using the OpenAI API. For a more in-depth explanation please watch:
Video: Exploring the Capabilities of ChatGPT | Calum Simpson | User Group (132 mins)- Convert your knowledge base into embeddings
- Store embeddings and their corresponding text content in a vector database
-
Set up a server that can do the following
- Convert user query into an embedding
- Lookup vector database to find embeddings that are closest to the embedding created out of user query
- Insert the content corresponding to the matching embeddings into the OpenAI System message
- Pass recent user chat history to the OpenAI API
- Wait for OpenAI to generate a response. Present the response to the user.
- Create a chatbot front-end widget
As you can see, developing a chatbot from scratch using the OpenAI API requires significant development effort and expertise. 3rd party chatbots on the other hand are much easier to program and embed on your website. As a rough estimate assume it will take a developer 20 days to build a custom chatbot - or $20K up front (assuming the developer costs $1000/day. Assuming a $399/month subscription of Chatbase on the other hand, it would take the custom solution over 4 years just to break even.
However, custom built chatbots provide a lot more control in how you train the AI model, what content you match the user query with, and what system message you provide the GPT engine to respond to a user’s query. You don’t get this level of control with 3rd party chatbots. The backend of custom built solutions can also be leveraged to serve multiple chatbots supporting completely different use cases. For e.g. one chatbot could provide basic company info to visitor’s on the company website, while a second chatbot could help employees find info on the company Intranet.
Cost to train the chatbot on your knowledge base is very inexpensive in both options. For example, you can train a chatbot on ~3000 pages for less than $1 USD using the OpenAI Embeddings model.
Chatbase vs Botpress - 2 popular solutions
If you decide to go with a 3rd party service, you might be torn between 2 popular platforms: Botpress and Chatbase.
Video: Do you know the best chatbot for your website? (8 min)GPT Integration Customization Pricing Botpress ❌ Traditional style of workflow and steep learning curve ✅ Wide range of integrations ✅ Free to start Chatbase ✅ Does everything with prompt engineering ✅ Easy customization ❌ Limited free plan options Making the right choice
While both platforms offer unique features, Chatbase stands out as the superior choice in most instances. Here's why:
- Easier customization and integration with various tools
- Chatbase's user-friendly interface makes it accessible to a wide range of users. A prompt engineer can setup, tweak and improve the system. No development required
- Botpress lacks the intuitive interface of Chatbase, and without extensive workflow development and testing, will fail in conversations
ChatGPT has an awesome API and Azure services that you can easily wire into any app.
The ChatGPT API is a versatile tool capable of far more than just facilitating chat-based conversations. By integrating it into your own applications, it can provide diverse functionalities in various domains. Here are some creative examples of how you might put it to use:
There are many different model types that you can use for different purposes.
- Automated Content Creation: Whether you’re generating blog posts, creating ad copy, or even writing a novel, the API can help streamline your creative process.
- Document Editing: The API can be integrated into word processors or documentation software to provide real-time suggestions, corrections, or even automatic content creation.
- E-Learning Platforms: From language learning apps to science tutoring software, you can utilize the API to create interactive, personalized learning experiences.
- Idea Generation Tools: Build a tool that interacts with the API to brainstorm innovative ideas, from business strategies to home decoration concepts.
- Coding Assistants: Whether it’s auto-generating pseudocode, suggesting code snippets, or providing guidance on best practices, you can create a valuable tool for programmers.
- Smart Home Automation: Enhance your smart home application by integrating ChatGPT to handle complex routines, provide usage suggestions, or interact with users more naturally.
- Project Management Software: Implement a smart assistant in your software that can help users plan, manage risks, or even generate project reports.
- Healthcare Apps: The API can be used to understand medical terminologies, describe symptoms, or provide basic health and wellness advice.
- Financial Management Tools: Integrate the API into your finance app for budget planning assistance, explaining complex financial terms, or even generating financial forecasts.
- Mental Health Apps: Use the API to guide mindfulness exercises, provide motivational messages, or suggest stress relief activities.
- Travel Planning Applications: Have the API generate itineraries, suggest interesting places, or provide information on local customs and etiquette. These examples only begin to explore the potential of the ChatGPT API. The key is to think creatively about how this powerful language model can be leveraged to meet your specific application needs.
Embedding a user interface (UI) into an AI chat can significantly enhance user interaction, making the chat experience more dynamic and user-friendly. By incorporating UI elements like buttons, forms, and multimedia, you can streamline the conversation flow and improve user engagement.
Benefits of Embedding UI into AI Chat
Embedding UI elements in AI chats can:
- Simplify complex interactions by providing users with intuitive options.
- Enhance data collection through structured forms and inputs.
- Improve user experience with multimedia elements like images, videos, and interactive charts.
- Streamline navigation with quick-reply buttons and menus.
Implementing UI Elements in AI Chats
One library that can help you embed UI elements in AI chats is Vercel AI SDK
This SDK allows you to integrate AI into your chat applications through the use of React Server Components. Your LLM can stream UI directly to clients without the need for heavy JavaScript.
See here for a demo of the Vercel AI SDK in action: Vercel AI SDK Demo.
Examples
Use Cases
Embedding UI elements in AI chats is beneficial for various use cases, including:
- Customer support: Providing quick-reply buttons for common queries.
- E-commerce: Embedding product images and links for easy browsing.
- Surveys and feedback: Using structured forms to collect user responses.
- Booking and reservations: Streamlining the booking process with date pickers and dropdowns.
- Data visualization: Displaying interactive charts and graphs for data analysis.
Comparing and classifying text can be a very time-consuming process, especially when dealing with large volumes of data. However, did you know that you can streamline this process using embeddings?
By leveraging embeddings, you can efficiently compare, categorize, and even cluster text based on their underlying meanings, making your text analysis not only faster but also more accurate and insightful. Whether you're working with simple keyword matching or complex natural language processing tasks, embeddings can revolutionize the way you handle textual data.
What are embeddings?
Embeddings are powerful tools that transform text into numerical representations, capturing the semantic meaning of words, phrases, or entire documents.
It is a way of grouping how similar certain pieces of text are, and take the form of a vector. You can think of an embedding as similar to a point in 2D space with an X and Y coordinate.
The only difference is that they actually have far more dimensions. For example, embeddings generated using OpenAI's embedding models have 1536 dimensions per vector!
What can embeddings be used for?
- Document clustering - Using embeddings you can group documents based on their content without the need to manually read and classify them.
- Search - Embeddings can speed up searches by a huge amount given that you can search using the vector value as opposed to text.
A good example of this is the SSW RulesGPT bot, which embeds the message you send it and uses the resulting vector to search for rules relevant to your question. - Recommendations - Embedded text can be easily compared based on it's content, making it perfect for recommending things like similar articles or books without the need to manually tag or categorise.
- Cross-lingual tasks - When you embed a piece of text the resulting embedding represents the meaning of the text and is not tied to any particular language.
This means you can use embeddings to compare different language texts without needing to read or translate the text!
How can embeddings be used?
When you have an embedding for two pieces of text you can perform a mathematical operation called cosine similarity, which measures the distance between the two vectors. The closer they are, the more similar the text.
Many databases such as Cosmos DB, Redis and Pinecone have inbuilt cosine similarity functions, making it easy to quickly compare embeddings.Other databases such as Postgres have plugins to handle vectors.
How do we get embeddings?
OpenAI provides specialized embedding models that are accessible via an API, similar to the GPT API. These models are generally cheaper than the GPT models, meaning large amounts of text can be embedded cheaply.
Find more information on these models and how to access them.
There are also open source and self hosted models available at Hugging Face.
Since the release of GitHub Copilot in 2021, we have witnessed a dramatic evolution in how developers work within their IDE. It started with a simple AI autocomplete, and has since progressed to a chat function. AI has now been integrated deeply into IDEs with products like Cursor and Windsurf, embedding an even deeper level of AI Integration within a developer's workflow.
Video: Let Cursor do the coding for you | Calum Simpson | SSW Rules (10 min)
Powerful features that AI-Powered IDEs provide
Code Completion
GitHub Copilot first popularized the 'code completion' feature for AI-powered IDEs. Code completion will try to guess what you are going to write, and suggest how to complete this line – saving time and effort by simply pressing 'tab'.
Command Generation
In Cursor and Windsurf, you can hit ctrl-K (or command-K), to convert natural language to a bash command. This is very useful for when you have forgotten the exact syntax of a bash command.
Chat
Chat functionality within an AI-powered IDE adds an intelligent assistant that can offer answers and suggestions without needing to leave the IDE. Unlike generic AI tools, it allows you to add context to your question, such as a file or even codebase, which lets the AI provide more tailored solutions.
Specify the level of context
Within the chat for Cursor, you can specify the level of context you would like to include with your prompt. By typing the
@
character, the following menu will appear.In Cursor, the
@Web
function is very useful for any situations where recent information, or information that the model has not been trained on is needed! You can also use@Git
to compare diffs with the main branch, to generate a nice PR summary.Agent
The Agent function in AI-powered IDEs represents a significant leap in a developer's workflow. It acts as a semi-autonomous coding assistant, capable of directly controlling the IDE (creating/editing files, reading the codebase, searching the web, executing bash commands).
AI-Powered IDE Comparison
Feature Cursor IDE + GitHub Copilot Windsurf GitHub Copilot Workspace Free Version • 2000 completions per month
• 50 slow premium requests per month• 2000 completions per month
• 50 chat messages per month