It’s here: the technology preview of MAX GPU 👨🚀 @media only screen and (max-width:639px){img.stretch-on-mobile,.hs_rss_email_entries_table img,.hs-stretch-cta .hs-cta-img{height:auto !important;width:100% !important} .display_block_on_small_screens{display:block}.hs_padded{padding-left:20px !important;padding-right:20px !important} .hs-hm,table.hs-hm{display:none}.hs-hd{display:block !important}table.hs-hd{display:table !important} }.moz-text-html .hse-column-container{max-width:600px !important;width:600px !important} .moz-text-html .hse-column{display:table-cell;vertical-align:top}.moz-text-html .hse-section .hse-size-6{max-width:300px !important;width:300px !important} .moz-text-html .hse-section .hse-size-12{max-width:600px !important;width:600px !important} @media only screen and (min-width:640px){.hse-column-container{max-width:600px !important;width:600px !important} .hse-column{display:table-cell;vertical-align:top}.hse-section .hse-size-6{max-width:300px !important;width:300px !important} .hse-section .hse-size-12{max-width:600px !important;width:600px !important} }@media only screen and (max-width:639px){ul,blockquote{margin:0;padding:1em 40px} }@media screen and (max-width:639px){.social-network-cell{display:inline-block} }#hs_body #hs_cos_wrapper_main a[x-apple-data-detectors]{color:inherit !important;text-decoration:none !important;font-size:inherit !important;font-family:inherit !important;font-weight:inherit !important;line-height:inherit !important} a{text-decoration:underline}p{margin:0}body{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;-webkit-font-smoothing:antialiased;moz-osx-font-smoothing:grayscale} table{border-spacing:0;mso-table-lspace:0;mso-table-rspace:0}table,td{border-collapse:collapse} img{-ms-interpolation-mode:bicubic}p,a,li,td,blockquote{mso-line-height-rule:exactly}
MAX 24.6 is live, featuring the MAX GPU preview, built on MAX Engine and MAX Serve with NVIDIA GPU support, nightly docs, & more!
Three years ago, we began reimagining AI development by rebuilding its infrastructure to be more performant, programmable, and portable. Today, we’re introducing MAX 24.6, featuring MAX GPU—a preview of the first vertically integrated generative AI serving stack that eliminates dependency on vendor-specific libraries like NVIDIA CUDA. MAX GPU is built on two groundbreaking technologies:
MAX Engine: A high-performance AI model compiler and runtime supporting vendor-agnostic Mojo GPU kernels for NVIDIA GPUs.
MAX Serve: A Python-native serving layer engineered for LLMs, handling complex request batching and scheduling for reliable performance under heavy workloads.
Learn more about the preview of MAX GPU, how it can transform your AI workflows, and what’s coming in 2025 in the blog post.
Build a Continuous Chat Interface with Llama 3 and MAX Serve 🦙
Build a production-ready chat application using Llama 3 and MAX GPU, our new vertically integrated serving stack that delivers high-performance inference without vendor-specific dependencies. From managing tokens with rolling context windows to handling concurrent requests and deploying via Docker Compose, this tutorial provides all the tools you need to create a scalable, GPU-accelerated chat solution.
The latest performance benchmarks for MAX Serve are here, showcasing strong throughput with the ShareGPTv3 dataset. This post takes a closer look at key factors like memory efficiency, batch size, and GPU utilization, and explores how PagedAttention will enhance future performance. Check out the full analysis to see how MAX Serve compares to vLLM and get a sneak peak into future optimizations.
在这里:MAX GPU 👨🚀 @media only 屏幕和 (max-width:639px){img.stretch-on-mobile,.hs_rss_email_entries_table img,.hs-stretch- 的技术预览cta .hs-cta-img{高度:自动!重要;宽度:100%!重要} .display_block_on_small_screens{显示:块}.hs_padded{padding-left:20px !重要;padding-right:20px !重要} .hs-hm,table.hs-hm{显示:无}.hs-hd{显示:块!重要}table.hs-hd{显示:表!重要} }.moz-text-html .hse-column-container{最大宽度:600px!重要;宽度:600px!重要} .moz-text-html .hse-column{显示:表格单元格;垂直对齐:顶部}.moz-text-html .hse-section .hse-size-6{最大宽度:300px!重要;宽度:300px!重要} .moz-text-html .hse-section .hse-size-12{max-width:600px !important;width:600px !important} @media 仅屏幕和 (min-width:640px){.hse-column-container {最大宽度:600px!重要;宽度:600px!重要} .hse-column{显示:表格单元格;垂直对齐:顶部}.hse-section .hse-size-6{最大宽度:300px!重要;宽度:300px!重要} .hse-section .hse-size -12{max-width:600px !important;width:600px !important} }@media 仅屏幕和(max-width:639px){ul,blockquote{margin:0;padding:1em 40px} }@media screen 和 (max-width:639px){.social-network-cell{display:inline-block} }#hs _body #hs_cos_wrapper_main a[x-apple-data- detectors]{颜色:继承!重要;文本装饰:无!重要;字体大小:继承!重要;字体系列:继承!重要;字体粗细:继承!重要;行高:继承!重要} a{text-decoration:underline}p{margin:0}body{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;-webkit-font-smoothing:antialiased;moz -osx-字体平滑:灰度}表{border-spacing:0;mso-table-lspace:0;mso-table-rspace:0}表,td{border-collapse:collapse} img{-ms-interpolation-mode:bicubic}p,a,li ,td,blockquote{mso-line-height-rule:完全}
MAX 24.6 已上线,具有 MAX GPU 预览版,基于 MAX Engine 和 MAX Serve 构建,支持 NVIDIA GPU、夜间文档等!
[查看于浏览器](https://d2Sj-804.na1.hubspotlinks.com/Ctc/DP+113/d2Sj-804/MWBcGwCLKC 1W8MK-6q2vmc-XW2dmkKk5pJnl1N4Hzmf45kBVqW6N1X8z6lZ3nCW5GSBgm7C8yHSW4gx67J7HrPP PN3WXK0rc1qStW333xh217-GBZW8bXv7G6NphzqW7wVS2j68lKlxW1cqcC68BW2wgN4GXPMMnmjWG VszVF36nk7LzN8Bf4vQcWjsNW2CNykH3k-SsYW8ymF0m7T4B2XW3Qz_NR6LylMpN9h1T0SFcrBPW2_ Pr416HztVFW4ktwSw4pt0WtVBkrzQ2Q1cDRW2JWpmb4bT4zhW7m7dw24ZS71bN7yznJ7B2GS2MW-W DSWx-J7W9kq-5C3FF8G1VfdP8b7qR11_W6w88D739mjHkW9fQGmc4d4rSCW6jBN-T765VP7W3dD2_ w8z9d8rM2k1G7CTw98Vb4H3H8f_Q_lW5gsGLM5c5Zz4N7D9Y_GYj9tqW3hxM9w8xHLj-W3y2NhG3c l5W7W81llJy6VcDVzN5mYXpqH9MXDW3jT0gC86x5cBW13kBFr91VHD7W6dYn7K41012Qf4ZSgpY04)
[![模块化](https://hs-24141518.f.hubspotemail.net/hub/24141518/hubfs/Group%201%20(5).png?width=1120\&upscale=true\&name=Group %201%20(5) .png)](https://d2Sj-804.na1.hubspotlinks.com/Ctc/DP+113/d2Sj-804/MWBcGwCLKC1W8MK-6q2vmc-XW2dmkKk5pJnl1N4Hzmf43m2ndW6N1vHY6lZ3nvW3jFTQT2zBdD- W486FMg1G8V2rW5XX92r2vcRy1W16kDTX2_rN2MW2vn_sR1y96_hW3H3Whw36bWNNW4Ztk bh19m_20W2P0sjM11D5_JW6Hcwkq8YxXPvW5G73l-6jjfL5W5pqZS_1YQl58W3GCLC81K0 36GN4Z4YjC1MYcyW67GP9p8KdsDjW3ST8fl5BW0S5W52BHq510Vc2tW5PRvw12WmrKFW1n WHsJ64bqQNW5J4q4F4mD-sQW62M9Wn7kqx3nW7LfD1H4rKzbqW5mNB4X8YJg2Ddf7Qqs04)
[!MAX(宇航员)携带GPU行走在雪原中,周围都是白雪山。](https: //d2Sj-804.na1.hu bspotlinks.com/Ctc/DP+113/d2Sj-804/MWBcGwCLKC1W8MK-6q2vmc-XW2dmkKk5pJnl1N4Hzmgg3qn 9gW95jsWP6lZ3kDW3BQTcw6k5b_yW15LgHY4M9-zbW1kqGZ72cfnq8W4HrHyP9lrcgPW4spck43XJwMyW5 Vzhc01WSfy7W2WDK0W8QBGfQW8vzmPV7_5zsjW4mp5Vz5Yd5wfW48W-wf17lBDLW7fBz5D1hP72MW3HXD- P4RHrptW7tvj2h8JHh5TW2fQlCM3Tgt3zMNtc04l8nLKVZZS0W2-QsJkW4Gt6GD52dM08W7Rjj833zZ10N W8zr77v1Q6MD4W2MqNtZ21X4QgW5y4VSZ1xBGKKW8P9XxG8GWTT8W51kSlr57ZDWyVSmLgg5LLPt_W4NVw tk74nNKJW2_v0Qy8J78kLW5y5kVL5Z0TByV6x_cx6tzQ_SW3PPDKM2pbMfnW5J8S1V1XycDCf1hGPgd04)
三年前,我们开始重新构想人工智能的开发,重建其基础设施,使其更加高性能、可编程和便携。今天,我们推出 MAX 24.6,具有 MAX GPU,这是第一个垂直集成的生成式 AI 服务堆栈的预览,消除了对 NVIDIA CUDA 等供应商特定库的依赖。 MAX GPU 基于两项突破性技术构建:
MAX Engine:高性能 AI 模型编译器和运行时,支持 NVIDIA GPU 的供应商无关的 Mojo GPU 内核。
MAX Serve:专为法学硕士设计的 Python 原生服务层,处理复杂的请求批处理和调度,以在繁重的工作负载下实现可靠的性能。
在博客文章中了解有关 MAX GPU 预览版、它如何改变您的 AI 工作流程以及 2025 年即将推出的内容的更多信息。
[认识 MAX GPU 👨🚀](https://d2Sj-804.na1.hubspotlinks.com/Ctc/DP+113/d2Sj-804/MWBcGwCLKC1W8MK-6q2vmc-XW2dmkKk5pJnl1N4Hzmgg3qn9gW95jsWP6lZ3pfW7b gLzw8Ks4ggW4WQ3rd2hyqxNW1bHWCR3QDyQjW8Z0Y5l13F1lnW8Q_Gt73_WsNHW1k hr284slVvqW4prX9Z751gkQVc8hbk76ZqZ6W5D0mbZ1rdJ0RW3fjF3D1HtHz1N4sRr cd8J4P-W7j3sPr7PpsRyW4P8l_4129M8HW5Znn419k0snpW3xMGXd3wkQCyW41gsq x28P8DvW84xZll9kLLY8W2zxsMH86H-8nW1X7Mvr1lf32sV1XrLj4qWzSSW3_Dp442 0K8ggW5v27B76qzwfxW61bJ9d5Dv1tTW28W2W51fcbn-W7NcNdQ5Zg95bW19mDdp9 7KK9rW78cgVb1g7_rKVnQsL7385q_zW4LyxvR84w5_zW8RGtc-83BsDsf45636W04)
使用 Llama 3 和 MAX Serve 构建持续聊天界面 🦙
[![MAX 旁边的一只骆驼(一个宇航员)](https://hs-24141518.f.hubspotemail.net/hub/24141518/hubfs/Llama\_MAX.png?width=520\&upscale=true\&name=Llama\_MAX.png)](https://d2Sj- 804.na1.hubspotlin ks.com/Ctc/DP+113/d2Sj-804/MWBcGwCLKC1W8MK-6q2vmc-XW2dmkKk5pJnl1N4Hzmgg3qn9gW95j sWP6lZ3l2VDKjqQ3Jy-YFW738v3d266KlZW7pbRNx13sF9xN2JGdmskTJzGW5g-yBh2L7-hGW5CFmPq5h LTqbW6xzqy42vbNxpW5zM61t3n00LtW3h7VvM6QbGk-VPQVzB1g_y0GW2N1LbZ6_n8QYW371_Mp1Y4Fh fW8GFxKn2kSX3jW68q7rS2Tr20MW9gct9F3Z8kygW2hVRxL2Y-9xKW15Kw-q6f0PRZW4yQ_ny8ZDPS-W4 N_RHh1KBxY4N1vb6BPG5MQWW1-LkkD5rQs3JW3ZWQqR4PqzSWW1dC2BT3WplBjW1ZFPDm9fZdbxW53bYg R1sbv42W1m3XVB20CsSPW91GFlP4SRbWbN12LshZxM6-lW4XJ5xz7m2jfFW90JwfN8b22Nzf2NN8YR04)
使用 Llama 3 和 MAX GPU 构建可立即投入生产的聊天应用程序,这是我们全新的垂直集成服务堆栈,可提供高性能推理,无需特定于供应商的依赖性。从使用滚动上下文窗口管理令牌到处理并发请求以及通过 Docker Compose 进行部署,本教程提供了创建可扩展、GPU 加速的聊天解决方案所需的所有工具。
[构建您自己的聊天应用程序💬](https://d2Sj-804.na1.hubspotlinks.com/Ctc/DP+113/d2Sj-804/MWBcGwCLKC1W8MK-6q2vmc-XW2dmkKk5pJnl1N4Hzmgg3qn9gW95jsWP6lZ3pzW5qd Kq81pM3jCW6yLMMM2G-MrsVTtgfy7RtwKhVt16Pd1LbppdW2QLVKF5VfttgW3_g_V_6SFZmbW95RQdv82wftsW5sVBZQ1P1NrlW27-T525d8lz9W2ZkB_q6CRwDMMl2kv 1CW1PYW3lJ8y116QcT3N52ymqBdPWvmN3k9FkCGg_12W6MvbdN587gfVW5HMdQn5 qpydQW28JyH730GjbJW1z2Vw213HLBxW6q2P098ZnG8nW7qgGCR6gm1C-V6f0Tm4B 1x4VW6Wby4M2mjFJfW5MCgS-1Hl92nW1qFN7Q6RvxycW62NzJN29zVW7N9bTF2C_F jQRW4Ls65S4Y8cyXW2QpXL32f7mjBW3fF9Dy49SjM1W4JHPh-2Zt8Ftf9kr3Wx04)
[![246-perf-hero-image-square](https://hs-24141518.f.hubspotemail.net/hub/24141518/hubfs/246-perf-hero-image-square.png?width=520\ &upscale=true&name=246-perf-hero-image-squa re.png)](https://d2Sj-804.na1.hubspotlinks.com/Ctc/DP+113/d2Sj-804/MWBcGwCLKC1W8MK-6q2v mc-XW2dmkKk5pJnl1N4Hzmgg3qn9gW95jsWP6lZ3mqN66XdSz502PjW8Nzk7m7n7HBmW71m3ls6TJdfkW8dP8Mg1 YD6J4W7z-F845RtLTqVhXZ0T2cf8-SW3sb_mB32WnBKW4fBtFs1XZwLhW3QG3Xv274WPyW1zSzdK91RF2HW77rj 4y2y5-6nV60kCD92Z64gW4hQ3cc3y-36tW1F0KfQ3RdbK3W8YJSPZ2FYywGW4KnLkg815xz0W5Z79V61dKqZFW1 8V35n3NRhm3W8xKLSP1gfCcfN8jKWK5r6wC9W27CdzV6p8sz7W3kv6pZ59qf16W6j2-jp54D3QjW4trnxk2f9mb zW49pGKk6mc8nFW6QpYV07CsSmSW5pGShz2fD-WwN6rJt5VytKY7W15KZLY9ltlZLW6pyfSZ2vqG6gf5y1LD604)
MAX Serve 的最新性能基准在这里,展示了 ShareGPTv3 数据集的强大吞吐量。这篇文章仔细研究了内存效率、批量大小和 GPU 利用率等关键因素,并探讨了 PagedAttention 将如何提高未来的性能。查看完整的分析,了解 MAX Serve 与 vLLM 的比较,并深入了解未来的优化。
[查看 MAX 发球实际操作🏃](https://d2Sj-804.na1.hubspotlinks.com/Ctc/DP+113/d2Sj-804/MWBcGwCLKC1W8MK-6q2vmc-XW2dmkKk5pJnl1N4Hzmgg3qn9gW95jsWP6lZ3l0W1wz 3nM6hkvWwW20VmVS62Kx_dW8ck03p7Fk0PVN5XVRGB3FJgKN4KJ78_TkSCSW29HS C66lWtSTW4t2pDs6p2LZyW4MhzpG6VL8NxW26f_YB450y5qW1m_QQc93kk6hW3jvk fw4NLX07VBSmds7fYpt_N7cB8Dx3fPGXN6G7xZLW_jVLW6RjvTd45P2C1W1LF6vv 4VhnjYW5zfv263_qgwyW31DcJ02rzY03N6nw2mryls1-W3yylHw6Fcl8_Vz0N434V S5SDW4wHdcy4g1f0KW1gYnvk7rmJ3HVcDN9r7DGgH_W6_TQjm6xkvS8W4y4y5h4Wc Nm-W2vXsXT46Fcb4W24bTcX9fdYs7W1JQYQ78kH4JfW8SZFcY2VW8jLf4dbPRC04)
发布者