Close Menu
5gantennas.org5gantennas.org
  • Home
  • 5G
    • 5G Technology
  • 6G
  • AI
  • Data
    • Global 5G
  • Internet
  • WIFI
  • 5G Antennas
  • Legacy

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024
Facebook X (Twitter) Instagram
Facebook X (Twitter) Instagram
5gantennas.org5gantennas.org
  • Home
  • 5G
    1. 5G Technology
    2. View All

    Deutsche Telekom to operate 12,500 5G antennas over 3.6 GHz band

    August 28, 2024

    URCA Releases Draft “Roadmap” for 5G Rollout in the Bahamas – Eye Witness News

    August 23, 2024

    Smart Launches Smart ZTE Blade A75 5G » YugaTech

    August 22, 2024

    5G Drone Integration Denmark – DRONELIFE

    August 21, 2024

    Hughes praises successful private 5G demo for U.S. Navy

    August 29, 2024

    GSA survey reveals 5G FWA has become “mainstream”

    August 29, 2024

    China Mobile expands 5G Advanced, Chunghwa Telecom enters Europe

    August 29, 2024

    Ateme and ORS Boost 5G Broadcast Capacity with “World’s First Trial of IP-Based Statmux over 5G Broadcast” | TV Tech

    August 29, 2024
  • 6G

    India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

    August 29, 2024

    Vodafonewatch Weekly: Rural 4G, Industrial 5G, 6G Patents | Weekly Briefing

    August 29, 2024

    Southeast Asia steps up efforts to build 6G standards

    August 29, 2024

    Energy efficiency as an inherent attribute of 6G networks

    August 29, 2024

    Finnish working group launches push for 6G technology

    August 28, 2024
  • AI

    Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

    August 29, 2024

    Why Honeywell is betting big on Gen AI

    August 29, 2024

    Ethically questionable or creative genius? How artists are engaging with AI in their work | Art and Design

    August 29, 2024

    “Elon Musk and Trump” arrested for burglary in disturbing AI video

    August 29, 2024

    Nvidia CFO says ‘enterprise AI wave’ has begun and Fortune 100 companies are leading the way

    August 29, 2024
  • Data
    1. Global 5G
    2. View All

    Global 5G Enterprise Market is expected to be valued at USD 34.4 Billion by 2032

    August 12, 2024

    Counterpoint predicts 5G will dominate the smartphone market in early 2024

    August 5, 2024

    Qualcomm’s new chipsets will power affordable 5G smartphones

    July 31, 2024

    Best Super Fast Download Companies — TradingView

    July 31, 2024

    Crypto Markets Rise on Strong US Economic Data

    August 29, 2024

    Microsoft approves construction of third section of Mount Pleasant data center campus

    August 29, 2024

    China has invested $6.1 billion in state-run data center projects over two years, with the “East Data, West Computing” initiative aimed at capitalizing on the country’s untapped land.

    August 29, 2024

    What is the size of the clinical data analysis solutions market?

    August 29, 2024
  • Internet

    NATO believes Russia poses a threat to Western internet and GPS services

    August 29, 2024

    Mpeppe grows fast, building traction among Internet computer owners

    August 29, 2024

    Internet Computer Whale Buys Mpeppe (MPEPE) at 340x ROI

    August 29, 2024

    Long-term internet computer investor adds PEPE rival to holdings

    August 29, 2024

    Biden-Harris Administration Approves Initial Internet for All Proposals in Mississippi and South Dakota

    August 29, 2024
  • WIFI

    4 Best Wi-Fi Mesh Networking Systems in 2024

    September 6, 2024

    Best WiFi deal: Save $200 on the Starlink Standard Kit AX

    August 29, 2024

    Sonos Roam 2 review | Good Housekeeping UK

    August 29, 2024

    Popular WiFi extender that eliminates dead zones in your home costs just $12

    August 29, 2024

    North American WiFi 6 Mesh Router Market Size, Share, Forecast, [2030] – அக்னி செய்திகள்

    August 29, 2024
  • 5G Antennas

    Nokia and Claro bring 5G to Argentina

    August 27, 2024

    Nokia expands FWA portfolio with new 5G devices – SatNews

    July 25, 2024

    Deutsche Telekom to operate 12,150 5G antennas over 3.6 GHz band

    July 24, 2024

    Vodafone and Ericsson develop a compact 5G antenna in Germany

    July 12, 2024

    Vodafone and Ericsson unveil new small antennas to power Germany’s 5G network

    July 11, 2024
  • Legacy
5gantennas.org5gantennas.org
Home»AI»With Blackwell GPUs, AI Gets Cheaper And Easier, Competing With Nvidia Gets Harder
AI

With Blackwell GPUs, AI Gets Cheaper And Easier, Competing With Nvidia Gets Harder

5gantennas.orgBy 5gantennas.orgMarch 19, 2024No Comments12 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest WhatsApp Email


If you want to take on Nvidia on its home turf of AI processing, then you had better bring more than your A game. You better bring your A++ game, several vaults of money, and a few bags of good luck. Maybe a genie in a bottle would help, too.

The launch of Nvidia’s “Blackwell” GPUs today at the 2024 GPU Technical Conference in San Jose represents the seventh and most impressive generation of datacenter class GPUs from the compute engine maker. This GPU computing wave started in the mid-2000s, but really started to crystalize with the launch of the “Kepler” K10 and K20 accelerators in May 2012.

And since that time, Nvidia has relentlessly driven Moore’s Law advances in transistors, advanced packaging, embiggening vector and matrix math engine designs, ever-reducing floating point math precision, and enfattening memory capacity and bandwidth to create compute engines that are 4,367X faster in terms of raw floating point performance than the base K10 with two GK104 GPUs was a dozen years ago. (And 8X of that performance jump is coming from reduced precision from FP32 single-precision to FP4 eighth precision floating point math, so the chip performance gain at constant precision is 546X.

And with advances in NVLink networking, hyperscalers, cloud builders, HPC centers and others can couple the memory and compute of hundreds of GPUs together tightly and with the advances in InfiniBand and Ethernet networking can lash together tens of thousands of GPUs together more loosely to build immensely powerful AI supercomputers that can also run HPC and data analytics workloads a whole lot faster, too.

How much faster the “Blackwell” B100 and B200 GPU accelerators are than their predecessor “Hopper” H100 and H200 GPUs that were launched in 2022 and 2023, respectively, remains to be seen. A lot of the architectural and performance details have yet to be divulged as we write this slightly ahead of the keynote presentation by Nvidia co-founder and chief executive officer Jensen Huang. We will be doing a follow-up story on the systems using the Blackwell GPUs and the usual architectural and economic deep dives on the new GPUs, stacking them up against their predecessors at Nvidia and the compute engines from AMD, Intel, and others.

AI Is Firmly In The Architectural Driver’s Seat

If the needs of the HPC sector for more floating point performance and lower energy consumption for that performance drove Nvidia’s initial compute designs, it has been clear since half precision FP16 units were added with the “Pascal” generation in 2016 and then tensor cores matrix math engines were added with the “Volta” generation only a year later in 2017 that machine learning workloads – specifically, deep learning neural networks – have been driving Nvidia’s architectural choices.

And with the Hopper and now Blackwell compute engines, the large language models that support generative AI are pushing the architecture even harder to drive down the cost of doing ever-larger AI training and inference workloads.

“Last year in 2023, we experienced the birth of multimodal generative AI, where text talks to image images, images can create texts, audio can create 3D – and not just in human modalities, but in areas of science of such as weather or DNA, molecules, and proteins and drug discovery,” explained Ian Buck, vice president of hyperscale and HPC at Nvidia, said in a prebriefing ahead of the conference. “A new kind of AI is emerging as a result. One that is even more intelligent, one that’s been built not just a single AI model, but as a collection of AI models, called a mixture of expert models – like Google Gemini or Meta NLLB or Mistral AI and of course OpenAI GPT-4. These new models actually take multiple AI models and have them operating in concert. And for every layer of a transformer, they are sharing their information to decide who has the best answer for the next layer to build even more intelligent models. This allows AI to scale further – to trillion parameter models – that we have never seen before. Of course, the challenge of that is computing. As models are getting larger, training takes more compute. Also, inference is becoming a bigger and bigger part of the challenge.”

Blackwell, in its various guises, is meant to address all of these challenges better than Hopper could.

(This seventh generation of true GPU compute engines is named after David Blackwell, a member of the National Academy of Sciences and a former professor at the University of California at Berkeley who did research in game theory, information theory, and probability and statistics.)

The Blackwell GPU complex has 208 billion transistors and is etched with a tweaked version of the 4 nanometer process from Taiwan Semiconductor Manufacturing Co called 4NP, which is a refined version of the custom 4N process that was used by Nvidia to etch the Hopper GPUs. The Blackwell device is actually comprised of two reticle-sized GPU chips, each with 104 billion transistors, that are lashed together using NVLink 5.0 interconnects along the middle of the dies, like a zipper stitching them together.

Because Nvidia was unable to use TSMC’s 3N 3 nanometer process, which still has some kinks in it apparently and which were talked about nearly three years ago, we think the Blackwell chips are a bit bigger and hotter than they might have otherwise been. They might be running at slightly slower clock speeds, too. But each Blackwell die is going to pack about 25 percent more floating point oomph than a Hopper die does, and then there are two of them in each package for a combined 2.5X increase in performance. Dropping down to FP4 eighth precision floating point processing will double it up again to a 5X boost in raw performance. The actual performance on workloads could be higher depending on memory capacity and bandwidth configurations on various Blackwell devices.

Buck says that there are six core technologies that have made the Blackwell GPUs, which will ship later this year, possible:

Those two Blackwell dies are zippered together with a 10 TB/sec NVLink 5.0 chip to chip link, called the NV-HBI and presumably short for High Bandwidth Interconnect. And importantly, Buck confirms that these two chips present a single GPU image to software and are not just two GPUs sitting side-by-side, as has been the case with prior GPUs from both Nvidia and rival AMD.

This is important because if a device looks like a single unit, it is programmed line a single unit and the network talks to it like a single unit, and therefore it is the unit of scale across the cluster. If this were not true, then a cluster will not scale as far as you might think it can. (How badly the scale is impaired would depend on the way the network talked to each die, but in the best case scenario of a bad situation, it would cut the cluster’s compute scale in half.)

We don’t many specific feeds and speeds for the B100 and B200 devices, but we do know that the top-end Blackwell with all of its features turned on – and don’t assume that this will be the case with the B200 – currently has 192 GB of HBM3E memory.

But as you can clearly see, there are four banks for each of the Blackwell chips in the package, for a total of eight banks, and the Blackwell GPUs almost certainly use 4 GB DRAM chips that are stacked up eight high for a total of 64 chips. If you look very carefully at the die, you can see that it is really eight compute complexes packaged up across those two Blackwell dies, a one to one pairing with banks of HBM3E memory that is the right kind of symmetry as far as we are concerned. The combined memory bandwidth of that 192 GB of memory, which is being sourced from both SK Hynix and Micron Technology, as we previously reported three weeks ago in He Who Can Pay Top Dollar For HBM Memory Controls AI Training, is 8 TB/sec.

The H100 from 2022 had 80 GB of memory and 3.35 TB/sec of bandwidth across five stacks, and an upgraded H100 that was paired with the “Grace” CG100 Arm server processors also made by Nvidia came out with six stacks at 96 GB and 3.9 TB/sec of bandwidth. If you want to be generous in your comparisons, then you compare to the original H100, and the top-end Blackwell will have 2.4X the memory and 2.4X the bandwidth. If you compare to the middle H100 with all the memory turned on, then the increase for the top-end Blackwell that Nvidia is promising to deliver this year is 2X for capacity and a little more than 2X for bandwidth. If you compare to the H200 with 141 GB of HBM3E memory and 4.8 TB/sec of bandwidth, then the capacity on the top-end Blackwell is 36.2 percent higher but its bandwidth is 66.7 percent higher.

We think that there is a very good chance, as we said above, that Nvidia is using 4 GB HBM3E memory and at eight-high stacks, that means only six of the eight stacks are yielded and working to get to 192 GB. And that means the Blackwell package can be upgraded to 256 GB of HBM3E capacity and 13.3 TB/sec of bandwidth at some point when the other two stacks of HBM3E memory start yielding in manufacturing. This might be something that is done for both the B100 and the B200, or just for the B200 when it is paired with an Nvidia CPU. We shall see, and Nvidia is not saying at the moment.

The Blackwell complex has NVLink 5.0 ports coming off the device that deliver 1.8 TB/sec of bandwidth, which is double the port speed of the NVLink 4.0 ports on the Hopper GPUs.

As was the case with all of the recent GPU compute engines from Nvidia, performance is not just cramming more flops into a chip and more memory to feed it. There are optimizations that are made to the chip’s architecture to suit particular workloads. With Hopper, we saw the first iteration of the Transformer Engine, which provided an adaptive precision range for tensors to speed up calculations. And with Blackwell, there is an improved second generation Transformer Engine that can do finer-grained scaling of precision within tensors. It is this feature, says Buck, that is allowing for the FP4 performance, which is largely going to be used to drive up inference throughput on GenAI and therefore drive down the cost of inference for GenAI.

“The transformer engine as it was originally invented with Hopper, what it does is it tracks the accuracy and the dynamic range of every layer of every tensor in the entire neural network as it proceeds in computing,” explained Buck. “And as the model is training over time, we are constantly monitoring the ranges of every layer and adapting to stay within the bounds of the numerical precision to get the best performance. In Hopper, this bookkeeping extends to a 1,000-way history to compute updates and scale factors to allow the entire computation to happen in only eight bits precision. With Blackwell, we take it a step further. In hardware, we can adjust the scaling on every tensor. Blackwell supports micro tensor scaling is not the entire tensor, which we can still monitor, but now we can look at the individual elements within the tensor. To take this a step further, Blackwell’s second generation Transformer Engine will allow us to take AI computing to FP4, or using only four bits floating point representation, to perform the AI calculation. That’s four zeros and ones for every neuron, every connection – literally the numbers one through 16. Getting down to that level of fine granularity is a miracle in itself. And the second generation Transformer Engine does that work combined with Blackwell’s micro tensor scaling, and that means we can deliver twice the amount of compute as before, we can double the effective bandwidth because from eight bits to four bits is half the size. And of course, double the model size can fit on an individual GPU.”

Nvidia has not talked about the performance of the 32-bit and 64-bit CUDA cores on the Blackwell chips or how the higher precision math works out on the tensor cores in the devices. We hope to have this all in the architectural deep dive that is coming tomorrow.

What we do know is that the B100 has peak FP4 performance of 14 petaflops and fits into the same 700 watt thermal design as the H100 that preceded it. The B200 is the one that comes in at 18 petaflops at FP4 precision, and it burns 1,000 watts. We have also been told privately by Buck that the GPUs used in the forthcoming GB200 NVL72 system are liquid cooled and run at 1,200 watts. Presumably they offer more performance for that wattage. (Else why bother?) The idea that the 1,200 watt Blackwell used in the GB200 NVL72 system delivers the 20 petaflops of FP4 performance spoken of in the charts fits and doesn’t insult our intelligence.

Nvidia is not talking about pricing for the B100 or B200 or its HGX B100 system boards, which will plug into existing HGX H100 server designs because they have the same thermals and therefore the same heat sinks. We expect for there to be at least a 25 percent premium on the price of the HGX B100 compared to the HGX H100, which would put it at around $250,000 for around 2.5X the performance, roughly speaking, at the same precision of math. That price could be — almost certainly will be — a lot higher on the street, of course, as happened with the Hopper generation.

Now, onto the Blackwell systems and the NVLink Switch 4 and NVLink 5 ports . . . . Stay tuned.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleNvidia launches cloud-based research platform for 6G research and testing
Next Article Power outage affects internet and calls out to thousands
5gantennas.org
  • Website

Related Posts

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024

Why Honeywell is betting big on Gen AI

August 29, 2024

Ethically questionable or creative genius? How artists are engaging with AI in their work | Art and Design

August 29, 2024
Leave A Reply Cancel Reply

You must be logged in to post a comment.

Latest Posts

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024

Crypto Markets Rise on Strong US Economic Data

August 29, 2024
Don't Miss

Business News | Communications Minister Scindia promotes 6G leadership and nationwide broadband in meeting with telecom operators

By 5gantennas.orgAugust 24, 2024

New Delhi [India]August 24 (ANI): Union Telecom Minister Jyotiraditya Scindia along with Minister of State…

SingTel and SK Telecom prepare for the 6G future

July 8, 2024

Apple focuses on 6G for future iPhones

December 11, 2023

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

About Us
About Us

Welcome to 5GAntennas.org, your reliable source for comprehensive information on 5G technology, artificial intelligence (AI), and data-related advancements. We are passionate about staying at the forefront of these cutting-edge fields and bringing you the latest insights, trends, and developments.

Facebook X (Twitter) Pinterest YouTube WhatsApp
Our Picks

4 Best Wi-Fi Mesh Networking Systems in 2024

September 6, 2024

India is on the brink of a new revolution in telecommunications and can lead the world with 6G: Jyotiraditya Scindia

August 29, 2024

Speaker Pelosi slams California AI bill headed to Governor Newsom as ‘ignorant’

August 29, 2024
Most Popular

Will 5G make 2024 the most connected year in the industry?

December 1, 2023

The current state of 5G in the US and how it can improve

September 28, 2023

How 5G technology will transform gaming on the go

January 31, 2024
© 2026 5gantennas. Designed by 5gantennas.
  • Home
  • About us
  • Contact us
  • DMCA
  • Privacy Policy
  • About Creator

Type above and press Enter to search. Press Esc to cancel.