📬Weekly AI Catch Up: This Week in AI & Generative AI.
Discover innovative AI research, new models, industry insights, and fascinating interviews. From matrix multiplication-free LLMs to AI-powered digital twins, we cover the latest advancements.
Welcome to this week’s edition of our AI newsletter! This week, we have exciting updates on innovative AI research, new models, industry insights, and fascinating interviews, like matrix multiplication-free LLMs, AI-driven training improvements, and multimodal LLMs with sketching abilities, with latest in humanoid robotics, AI-powered digital twins, and cutting-edge developments from leaders like Microsoft and Google DeepMind. Let's dive in!
Favorite Updates this week:
1. Matrix Multiplication-Free LLMs
This study was very much talked about! A new study proposes a novel training approach that eliminates the need for matrix multiplication. This method significantly reduces memory usage by 10x and increases training speed by 25.6%, offering a more efficient path for training LLMs. https://arxiv.org/abs/2406.02528
2. Give the LLMs a pen and paper: Sketching as a Visual Chain of Thought for Multimodal LLMs
I personally liked this one: 📝 Visual Chain-of-Thought for Multimodal Language Models.
Researchers have enhanced multimodal LLMs by integrating drawing and sketching abilities, improving their chain-of-thought reasoning. Current chain-of-thought and tool-use paradigms only use text as intermediate reasoning steps. This approach, similar to how humans use sketches and notes, has shown a 10% improvement in reasoning tasks.
It's fascinating to see how replicating human cognitive strategies can lead to significant advancements in AI.
https://arxiv.org/abs/2406.09403
3. Can LLMs Invent Better Ways to Train LLMs?
Researchers at Sakana have explored the potential of using large language models (LLMs) to improve the training of LLMs themselves. This approach demonstrates how AI can be leveraged to enhance its own development, discovering better algorithms that align with human preferences. https://sakana.ai/llm-squared/
4. HumanPlus: Enhancing Humanoid Robots
Stanford University's HumanPlus project introduces a full-stack system enabling humanoid robots to learn and perform tasks by shadowing human activities. Utilizing a transformer model, these robots can efficiently learn motion and autonomous skills. https://arxiv.org/abs/2406.09403
Interesting Thoughts
5. Each Mathematician will soon use AI as a Co-Pilot?
Fields Medalist Terence Tao discusses how AI, particularly proof checkers, and AI programs, are revolutionizing the field of mathematics, acting as co-pilots for mathematicians. https://www.scientificamerican.com/article/ai-will-become-mathematicians-co-pilot/
6. AI-Powered Digital Twins for Meetings
Zoom founder Eric Yuan envisions a future where AI-powered digital twins can attend meetings on behalf of users, transforming the enterprise software landscape. https://www.theverge.com/2024/6/3/24168733/zoom-ceo-ai-clones-digital-twins-videoconferencing-decoder-interview
Model Updates
7. Microsoft's Aurora
Microsoft researchers have introduced Aurora, a new foundation model trained on over 1M hours of weather and climate data. This 1.3B parameter model can generate a 5-day global air pollution prediction in under 60 seconds. https://www.microsoft.com/en-us/research/blog/introducing-aurora-the-first-large-scale-foundation-model-of-the-atmosphere/
8. Husky: A Unified, Open-Source Language Agent
Husky, an open-source language agent, showcases impressive multi-step reasoning capabilities, matching or exceeding the performance of frontier models like GPT-4. https://arxiv.org/abs/2406.06469
9. Real-Time Transcription with Whisper
Whisper can now run real-time transcription locally on your browser, enhancing accessibility and usability. https://huggingface.co/spaces/Xenova/realtime-whisper-webgpu
10. Samba 3.8B: A High-Performance Model
The Samba 3.8B model, featuring the Mamba+Sliding Window Attention architecture, significantly outperforms Phi3-mini on key benchmarks, offering infinite context length with linear complexity. https://arxiv.org/abs/2406.07522
OpenVLA: Vision-Language-Action Model
OpenVLA is a new open-source model for robot manipulation, setting a new standard for generalist robot policies.
OpenVLA sets a new state of the art for generalist robot manipulation policies. It supports controlling multiple robots out of the box and can be quickly adapted to new robot setups via parameter-efficient fine-tuning. The OpenVLA checkpoints and PyTorch training pipeline are fully open-source and models can be downloaded and fine-tuned from HuggingFace.
https://openvla.github.io
Interesting Interviews
12. Ilya Sutskever on Transformers
Ilya Sutskever discusses the sufficiency of transformers in AI and their potential future developments.
13. Demis Hassabis of Google DeepMind
An insightful conversation with Demis Hassabis covers various aspects of AI development at Google DeepMind.
Noteworthy Papers
14. Lamini Memory Tuning - how to be more accurate than RAG?
Introducing Lamini Memory Tuning, a method achieving 95% LLM accuracy with 10x fewer hallucinations.
https://www.lamini.ai/blog/lamini-memory-tuning
Self-Tuning and Self-Teaching LLMs
This paper explores how LLMs can acquire new knowledge effectively through self-teaching methodologies. https://arxiv.org/abs/2406.06326
16. SelfGoal: Achieving High-Level Goals to improve complex reasoning
A proposed approach for LLMs to adaptively break down high-level goals into practical subgoals during interaction. https://arxiv.org/abs/2406.04784
17. VALL-E 2: Human Parity Zero-Shot TTS
Microsoft’s VALL-E 2 achieves human parity in zero-shot text-to-speech performance. https://arxiv.org/abs/2406.05370
18. ShiftAddLLM: Efficient LLM Reparameterization
Introducing ShiftAddLLM, which offers significant memory and energy reductions for pretrained LLMs - 80% memory and energy reductions over the original LLMs https://arxiv.org/abs/2406.05981
19. Leveraging AI-Synthetic Data
Synthesized data from generative models is increasingly considered as an alternative to human-annotated data for fine-tuning Large Language Models. This raises concerns about model collapse: a drop in performance of models fine-tuned on generated data. A new algorithm uses rank-and-prune feedback to maintain and even surpass original performance while leveraging synthetic data. https://arxiv.org/abs/2406.07515
20. Towards Lifelong Learning of LLMs - LLMs learning over time!
This paper addresses the challenge of enabling LLMs to learn continuously and adaptively throughout their operational lifetime.
https://arxiv.org/abs/2406.06391
21. Transformers Meet Neural Algorithmic Reasoners
Google’s latest paper introduces Neural Algorithmic Reasoners, offering significant gains over traditional transformers for complex algorithmic tasks. https://arxiv.org/abs/2406.09308
Industry Updates
22. Apple Intelligence Announcement
Apple teases a future where seamless AI integration into daily life becomes a reality. https://x.com/karpathy/status/1800242310116262150
23. NVIDIA’s AI Predictions
NVIDIA’s CEO Jensen Huang predicts a future dominated by robotic factories orchestrated by AI. https://qz.com/ai-next-wave-robots-nvidia-jensen-huang-blackwell-rubin-1851515953
24. Former NSA Director Joins OpenAI Board
OpenAI appoints retired US Army General to its board, signaling a strategic move in AI governance. [Read more](https://openai.com/index/openai-appoints-retired-us-army-general/)
25. OpenAI's Booming Revenue
OpenAI’s annualized revenue has reportedly reached $3.4 billion, reflecting its growing impact in the AI industry. [Read more](https://www.engadget.com/openais-revenue-is-reportedly-booming-230324957.html)
New Frameworks
26. LaVague: Build AI Web Agents in 10 Lines of Code
A new framework allows you to build AI Web Agents that outperform Gemini and ChatGPT in information retrieval with minimal coding.
https://github.com/lavague-ai/LaVague
Tutorials
27. Reproducing GPT-2 from Scratch
A comprehensive video tutorial that covers the entire process of building and optimizing GPT-2.
28. Step-by-Step Diffusion: An Elementary Tutorial
This paper provides a detailed tutorial on the diffusion process, essential for understanding modern generative models. [Read more](https://arxiv.org/abs/2406.08929)
Miscellaneous
29. ARC Prize: AI Competition
The ARC Prize offers $1,000,000 for creating an AI that can adapt to novelty and solve reasoning problems. https://arcprize.org/ and Interview: https://www.dwarkeshpatel.com/p/francois-chollet
30. DenseAV: Discovering Language from Videos
MIT researchers have developed a new algorithm that learns language by watching videos. https://news.mit.edu/2024/denseav-algorithm-discovers-language-just-watching-videos-0611
31. Towards a Personal Health LLM
Google’s Gemini model is fine-tuned for personal health data, showing promise in applications like sleep and fitness tracking. (https://arxiv.org/abs/2406.06474)
32. Google DeepMind's Virtual Rodent
DeepMind’s new project features a virtual rodent with a neural network that mimics real rat behavior, providing insights into neural activity. https://www.sciencedaily.com/releases/2024/06/240611130418.htm
Stay tuned for more cutting-edge updates and insights in the world of AI. Thank you for reading, and see you next week!







Thank you for this great summary, very insightful @vlad