Why Companies Like OpenAI and Microsoft Are Venturing into Custom Chip Development

    As the demand for generative AI technology continues to rise, industry giants, including Microsoft, Google, AWS, and OpenAI, are exploring the development of their own custom chips tailored for AI workloads. Contrary to popular belief, the primary driver behind this push isn't chip shortages but rather a strategic shift toward optimizing the efficiency and cost-effectiveness of processing generative AI queries.

    Speculation has swirled around efforts by OpenAI and Microsoft to develop custom chips for handling generative AI tasks, with Microsoft collaborating with AMD on a project codenamed Athena and OpenAI rumored to be eyeing potential acquisitions to bolster its chip-design capabilities. In the meantime, Google and AWS have already introduced their own chips for AI workloads in the form of Tensor Processing Units (TPUs) for Google and AWS' Trainium and Inferentia chips.

    So, what's motivating these companies to delve into custom chip development? Analysts and experts point to two key factors: the cost of processing generative AI queries and the efficiency of existing chips, primarily Graphics Processing Units (GPUs). Currently, Nvidia's A100 and H100 GPUs dominate the AI chip market, but their efficiency in handling generative AI workloads is under scrutiny.

    Nina Turner, a research manager at IDC, notes that GPUs may not be the most efficient processors for generative AI tasks, and creating custom silicon could potentially address this efficiency issue. GPUs, while highly effective for matrix inversion, a fundamental mathematical process in AI, are costly to operate. The pursuit of silicon processors optimized for specific AI workloads could help alleviate cost-related concerns.

    Custom silicon, according to Turner, has the potential to reduce power consumption, improve compute interconnectivity, and enhance memory access, ultimately lowering query costs. For instance, OpenAI's operation cost for ChatGPT is roughly $694,444 per day, which translates to 36 cents per query, based on a report from research firm SemiAnalysis.

    Furthermore, custom silicon provides the advantage of exerting control over chip access and designing elements tailored specifically for large language models (LLMs), thereby enhancing query speed.

    This shift towards custom chip design is likened to Apple's approach to producing chips for its devices, where specialization trumps general-purpose processors. Despite the popularity of Nvidia's GPUs, they, too, are considered general-purpose devices. Custom chips could be the answer to optimizing performance for specific functions, such as image processing and specialized generative AI.

    However, experts caution that developing custom chips is no easy feat. It involves significant challenges, including high investment requirements, lengthy design and development timelines, complex supply chain issues, a scarcity of talent, and the need for a sufficient volume of production to justify the expenditure.

    For companies embarking on this journey from scratch, the process can take a minimum of two to two and a half years, with the scarcity of chip design talent causing delays. Several large tech companies have mitigated this challenge by either acquiring startups with expertise in chip development or partnering with experienced firms in the field.

    Despite ongoing discussions about chip shortages, experts believe that the move towards custom chip development by companies like OpenAI and Microsoft is more about addressing inference workloads for LLMs, particularly as Microsoft continues to incorporate AI features into its applications. It appears that these companies have specific requirements that aren't met by existing solutions, and a specialized chip for inference workloads, which is more cost-effective and efficient than large GPUs, may be the solution.

    Acquiring a major chip designer may not be a cost-effective approach for OpenAI, given the substantial expenses involved in designing and producing custom chips. Instead, experts suggest that OpenAI could explore the acquisition of startups with AI accelerators, a more economically viable option.

    To support inferencing workloads, potential acquisition targets could include Silicon Valley firms like Groq, Esperanto Technologies, Tenstorrent, and Neureality. Additionally, SambaNova might be a suitable candidate if OpenAI is willing to transition away from Nvidia GPUs and adopt an on-premises approach, moving beyond a cloud-only paradigm.