In August 2023, Meta launched Code Llama, a large-scale language model explicitly designed for coding tasks, based on the previous model Llama 2. Today, the company has unveiled “a more powerful new version”, Code Llama 70B.
Code Llama 70B has been trained with 500 billion code tokens and associated data. It can process and generate longer code sequences thanks to a contextual window of 100,000 tokens.
According to Meta, Code Llama 70B uses a technique called self-attention to understand code structures. It can implement algorithms, sort, search, and much more from text or code snippets in many languages such as Python, C++, Javascript, and Java.
What does Llama 70B offer and why is it a step further
This AI-based tool includes variants that have been fine-tuned for specific tasks. One of them is CodeLlama-70B-Instruct, which has been trained to understand natural language instructions.
There is also a Python-focused version called CodeLlama-70B-Python. With additional training on 100 billion Python code tokens, it generates Python with “unmatched fluency and accuracy,” according to Meta.
Code Llama 70B can be freely downloaded under the same open license as previous Code Llama models. Meta states that this permissive license allows both academic and commercial users to modify the model.
This is clearly competition to GitHub and Microsoft’s Copilot. Developers can ask Copilot questions about their code, get explanations about specific parts of the code, and even have Copilot fix errors in the code.
Recently, Google also made Duet AI and Gemini Pro available to the general public, its AI-based code generation and completion tool.