Archive - Aziz et al. Paper Summaries

GPT-2 Implementation From Scratch For Dummies!

If you ever had trouble understanding the code behind the attention mechanism and the research paper's complex code, this is for you.

Apr 6 •

June 2024

Chameleon, Meta's Mixed-Modal Foundation Model

Recently, Meta released a new family of mixed-modal foundation models.

Jun 2, 2024 •

April 2024

Phi-3, your Pocket LLM!

Microsoft recently introduced Phi-3 family of models that is at the same time highly performant but also small enough to run on your phone without…

Apr 29, 2024 •

DoRA is The New LoRA!

The true strength of LLMs lies in how we can employ them for our own needs using our specific data. Various challenges arise when attempting this.

Apr 7, 2024 •

March 2024

Complete Summary of Absolute, Relative and Rotary Position Embeddings!

Position embeddings have been used a lot in recent LLMs. In this article, I explore the concept behind them and discuss the different types of position…

Mar 31, 2024 •

Is Step Back Prompting The Best Prompting Strategy?

Despite the proficiency of LLMs in conversations and information retrieval, they still struggle with multi-step reasoning tasks.

Mar 20, 2024 •

What Is SwiGLU? How to Implement It? And Why Does it Work?

If you have been in the AI space lately specifically in Large Language Models, you might have seen the word "SwiGLU" thrown around here and there in the…

Mar 13, 2024 •

Dissecting OLMo, The Most Open Source LLM Paper!

Over the course of the last year, since the wave of LLMs started, many companies and organizations have released numerous models and papers.

Mar 6, 2024 •

December 2023

Does ZEPHYR 7B Outperform 70B LLama-2 Chat?

Zephyr 7B is a model released by HuggingFace. According to the technical report Zephyr-7B outperforms LLama2-Chat on MT-Bench dataset.

Dec 13, 2023 •

October 2023

How Speculative Sampling Can Increase Your LLM's Inference Speed!

During the last few years of NLP progress, we have seen the increase in performance proven by transformer models as they grow larger and larger.

Oct 7, 2023 •

September 2023

Cloze-Driven Pretraining of Self-Attention Networks Summary

This paper introduces a new pre-training method for bidirectional transformers that improves performance on a variety of language understanding…

Sep 29, 2023 •

Your Complete Guide to RCNN, Fast-RCNN, Faster-RCNN and Mask-RCNN

In this article, I provide a detailed overview and summary of the RCNN family.

Sep 24, 2023 •

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts