Key Takeaways
- $400M valuation: RadixArk, the commercial spinout of open‑source inference engine SGLang, has reportedly been valued at around $400 million in a round led by Accel.
- From lab to startup: SGLang originated in 2023 in UC Berkeley professor and Databricks co‑founder Ion Stoica’s lab and was commercialized as RadixArk in August 2025.
- High‑profile pedigree: CEO Ying Sheng, formerly at xAI and Databricks, leads a team whose tooling already powers companies such as xAI and Cursor.
- Inference boom: The spinout lands amid a funding surge for inference infrastructure, with rivals like Baseten raising $300M at a $5B valuation and Fireworks AI securing $250M at a $4B valuation.
Quick Recap
Project SGLang, a fast‑growing open‑source engine for running large language models, has formally spun out of UC Berkeley as RadixArk, a commercial startup valued at roughly $400 million in a funding round led by Accel, according to TechCrunch. The company, announced last August, will steward SGLang while building paid hosting and enterprise products focused on cutting the cost of AI inference. TechCrunch disclosed the deal in a report and accompanying post on X.
From SGLang Lab Project to RadixArk Inference Business
RadixArk is the new corporate home for SGLang, a high‑performance LLM serving framework that uses techniques like RadixAttention and aggressive key–value (KV) cache reuse to drive up tokens‑per‑second and cut latency without adding more GPUs. The project was incubated in Ion Stoica’s UC Berkeley lab and has quickly become a go‑to engine for teams at xAI, Cursor and other early adopters seeking cheaper, faster inference at scale.
The spinout formalises that momentum. RadixArk’s latest round, led by Accel with earlier angel backing from investors including Intel CEO Lip‑Bu Tan, values the company at about $400M, although the exact check size remains undisclosed. Co‑founder and CEO Ying Sheng, a key SGLang contributor who previously engineered production systems at xAI and worked as a research scientist at Databricks, has brought a portion of the SGLang maintainer team into the startup.
Technically, RadixArk will continue to advance SGLang as an open‑source model engine, while layering on commercial offerings. Those include Miles, an enterprise‑facing reinforcement learning framework for large‑scale post‑training workloads, and a growing portfolio of managed hosting and support services, marking a deliberate open‑core strategy. Research work such as InfiniteHiP and TokenSelect has already demonstrated SGLang‑based pipelines operating on 1M–3M token contexts, underscoring its positioning in the ultra‑long‑context, high‑throughput segment of the inference market.
Why This Spinout Matters in the Inference Land Grab
The RadixArk deal comes amid a broader re‑rating of inference infrastructure as the next major AI platform layer. Inference the cost of running models after training now represents a large share of AI server spend, and even modest efficiency gains can translate into multi‑million‑dollar savings for heavy users. That dynamic has turned specialized serving engines into prime venture targets.
RadixArk is not alone. The vLLM project is being commercialized via Inferact, which has reportedly raised around $150 million at an $800M valuation to push open‑source vLLM into more production deployments. At the platform layer, Baseten and Fireworks AI have secured mega‑rounds at $5B and $4B valuations respectively, betting that control over high‑performance inference runtimes, KV‑cache reuse, speculative decoding and structured outputs will define the next tier of AI winners. RadixArk’s emergence signals that SGLang intends to be one of the core engines in that stack rather than merely a research curiosity.
Competitive Landscape
RadixArk (SGLang) vs Inferact (vLLM) vs Fireworks AI
- RadixArk (SGLang) – subject of this news; open‑core inference engine plus commercial hosting.
- Inferact (vLLM) – new startup commercializing the widely used vLLM inference engine with $150M in funding at an $800M valuation.
- Fireworks AI – independent inference platform offering hosted open‑source and proprietary LLMs with transparent, per‑token pricing.
| Feature/Metric | RadixArk (SGLang) | Inferact (vLLM) | Fireworks AI |
| Context Window | SGLang‑based research (e.g., InfiniteHiP/TokenSelect) has demonstrated 1M–3M token contexts; production limits are model‑dependent. | vLLM runs models like Llama 3.1 at full 128K context and supports configurations beyond 128K via long‑context features. | Offers hosted models with up to 128K+ token windows (e.g., DeepSeek R1, Qwen2.5‑VL, long‑context Qwen/Gemma variants). |
| Pricing per 1M Tokens | Not publicly listed; early revenue driven by custom enterprise hosting and support contracts. | Not publicly disclosed; expected to follow enterprise / volume contracts around vLLM deployments. | Transparent usage pricing: entry‑tier models from $0.10 per 1M tokens, high‑end models up to roughly $0.90–$1.20 per 1M tokens. |
| Multimodal Support | RadixAttention design can be extended to image tokens, enabling multimodal models on SGLang; focus today is text‑centric workloads. | vLLM supports multimodal inference patterns (e.g., image‑text LLaVA‑style setups) via backend integrations. | Broad catalog of text and vision‑language models; Qwen2.5‑VL 32B, Gemma 3 and others offer 128K‑token multimodal contexts. |
| Agentic Capabilities | SGLang natively targets complex agent workflows (ReAct, tree‑of‑thought, multi‑step programs) with KV‑cache reuse for branches. | Optimized for long‑running, high‑throughput inference; widely used as the engine under many agent frameworks, though orchestration is external. | Platform emphasizes models with tool‑use and structured output; several hosted models are explicitly tuned for agentic tool‑calling scenarios. |
RadixArk appears strongest on agent‑style workloads and ultra‑long context research thanks to SGLang’s RadixAttention and KV‑reuse, while Inferact is better positioned for organizations standardizing on vLLM across heterogeneous hardware at massive scale. Fireworks AI remains the most cost‑transparent and turnkey option, especially for teams that want immediate access to long‑context, multimodal models with clear per‑token pricing rather than managing their own serving stack.
TechViral’s Takeaway
In my experience, these kinds of spinouts are where the real infrastructure platforms emerge, and RadixArk looks like a serious contender rather than just another AI tooling startup. I think this is a big deal because a $400M valuation for what is essentially an optimized inference engine validates just how central cost‑efficient serving has become to AI economics. While I generally prefer platforms with transparent per‑token pricing for everyday builders, RadixArk’s open‑core approach, keeping SGLang free while monetising managed hosting and advanced tooling, strikes me as structurally bullish for both enterprise adoption and the broader open‑source ecosystem. If the team can convert its performance lead on complex, agentic, long‑context workloads into a sticky commercial platform, this move should be net positive for developers, and it underscores that in the AI stack, whoever owns inference efficiency will own a big slice of the value.