There is growing interest in the potential of generative AI and large-scale language models in the enterprise. Companies are currently in the early stages of adoption, primarily experimenting with LLM APIs from companies such as OpenAI, AI assistants such as Microsoft Copilot, and specialized products designed for tasks such as image generation and marketing copywriting. .
Pre-trained LLMs offer great capabilities such as language processing, data analysis, and content generation. However, these models are trained on public datasets rather than specific company data. Without training and ongoing access to corporate data, the full potential of LLM applications remains unexploited. Additionally, LLM has other notable limitations, including hallucinations, data privacy risks, and security concerns.
For custom LLM applications, the generative AI stack consists of several key components.
Of course, beyond this application-level stack, there's also the hardware infrastructure needed to train AI models and host them in production for real-time inference.
Considerations when planning an enterprise-generated AI stack
Although LLM technology is promising, it also poses certain risks regarding accuracy, relevance, and security.
To alleviate these concerns and ensure businesses maximize ROI, it is important to understand the trade-offs associated with different technology choices and adopt modular component stacks. A modular generative AI stack increases flexibility and adaptability, allowing organizations to swap out components as needed and incorporate new technologies as they emerge.
Establishing guidelines and standards for the key components of your generative AI stack is critical to project success. Here are some important considerations.
Choosing an LLM
LLM is the foundation of the generative AI stack. While some LLMs are widely recognized, such as OpenAI and Google's LLM, the LLM landscape is diverse and offers a variety of options. These LLMs differ in their training data, best use cases, and performance on common tasks.
LLMs come in different sizes, and the size of the model is determined by the number of parameters. Larger models may have higher accuracy, but require more computational power and increase inference time. Similarly, LLMs support different context window sizes. A larger context window provides more detailed prompts and allows the model to produce more relevant, context-aware output.
Cost of using LLM
LLM pricing is based on input size (number of tokens in the original prompt) and output size (number of tokens produced). Because users often try multiple prompts or repeat initial output, it's important to account for these additional costs when budgeting. Researching different LLM providers and comparing pricing models can also help organizations find the most cost-effective model for their specific use case.
Open source vs. proprietary LLM
Another important consideration is whether to choose an open source or proprietary (also known as closed source) LLM.
Open source options offer greater flexibility and control. For example, organizations can run these on their own IT infrastructure, whether on-premises or in a private cloud, allowing them to better monitor data privacy. However, running an open source LLM in-house requires a high degree of in-house technical expertise.
Commercial options come with better support and regular updates, making them easier to implement and maintain. However, when using your own LLM, it's important to carefully review the fine print regarding your provider's data processing practices to ensure they meet your organization's regulatory and compliance standards.
Cost is also a factor to consider when comparing open source vs. proprietary generative AI. With an open source LLM, organizations do not have to pay any usage fees, but IT leaders must invest in the infrastructure and personnel required to run and maintain an open source LLM. Conversely, proprietary LLMs typically charge based on usage, making them cost-effective for organizations with low or intermittent needs for LLM access.
Comparison of domain-specific LLM and horizontal LLM
Another important consideration is whether a domain-specific LLM or a horizontal LLM best suits your needs. Domain or industry-specific LLMs are trained based on data from specific sectors such as finance, law, and healthcare. These models are also optimized for common tasks and applications within these industries, allowing you to generate outputs better tailored to your specific domain.
However, such LLMs may have limited value outside their intended domain. Use cases that require a broader knowledge base may be better served by horizontal LLM, which is a more general model trained on a wide range of data across different domains. Horizontal LLMs can also be fine-tuned and adapted to specific domains using techniques such as transfer learning.
Approaches to improving enterprise LLM applications
When refining and customizing enterprise LLM applications, common approaches include prompt engineering, search extension generation (RAG), and fine-tuning. The latter two of his techniques help incorporate additional domain- or company-specific data into LLM applications and address concerns about factual accuracy and relevance.
- Rapid engineering and templates. Prompted engineering is a simple approach that focuses on improving a model's output without changing the model's weights. Prompt templates and prompting best practices are used to guide the model to the desired output. Prompt marketplaces like PromptBase offer a variety of prompts for different AI models.
- rug. RAG is a technique for retrieving data from an enterprise repository or external sources to generate more contextual and accurate responses. Internal data is first stored as vectors in a vector database such as Pinecone or Chroma. SDKs and frameworks such as LlamaIndex and LangChain facilitate connections between LLM and data sources.
- Tweak. Fine-tuning is the practice of further training a pre-trained LLM with company or domain-specific data. Unlike RAG, which does not change the LLM weights, fine-tuning updates the LLM weights to better capture domain-specific nuances. You can fine-tune your LLM using tools like Snorkel and Databricks MosaicML.
After customizing and refining your enterprise LLM application, the next step is deployment and ongoing monitoring. Tools such as Portkey and Arize are used to deploy and monitor LLM applications in production, including troubleshooting, updates, and enhancements.
Kashyap Kompella is an industry analyst, author, educator, and AI advisor to leading companies and startups in the United States, Europe, and Asia Pacific. Currently, he is the CEO of RPA2AI Research, a global technology industry analyst firm.