TL;DR
A developer has created a Linux kernel module that allows consumer-grade USB4/Thunderbolt ports to function as InfiniBand devices. This enables high-speed RDMA communication between home computers, potentially transforming AI workloads without enterprise gear.
A developer has built an experimental Linux kernel module that enables ordinary USB4 and Thunderbolt ports on AMD mini PCs to emulate InfiniBand devices, achieving high-speed RDMA communication suitable for AI workloads at home. This breakthrough could allow consumers to perform tensor-parallel inference and distributed training without enterprise networking gear, marking a significant step in democratizing high-performance computing.
The project involves creating a Linux kernel module that tricks USB4/Thunderbolt ports into acting as InfiniBand interfaces, enabling RDMA-over-USB4. The developer reports achieving bidirectional data transfer rates of approximately 95 Gb/s with latency around 7 microseconds, comparable to enterprise-grade InfiniBand networks. Tests include running large AI inference models and FSDP training steps across two consumer mini PCs, with performance surpassing traditional Ethernet and soft-RoCE setups. The implementation is experimental, built for research purposes, and involves loading custom kernel modules that may cause system instability. The developer emphasizes that this is not production software and lacks official support.
Why It Matters
This development matters because it could dramatically lower the cost and complexity of deploying high-performance AI workloads at home or in small labs. Smart home security cameras could benefit from similar high-speed networking innovations. By enabling consumer hardware to emulate InfiniBand, it opens the possibility for hobbyists, researchers, and small organizations to perform tensor-parallel inference and distributed training without expensive enterprise networking gear. If further developed and stabilized, it could influence the future of distributed AI computing, making high-speed interconnects more accessible.

OWC Thunderbolt 4 10G Ethernet Adapter, for High-Speed Network Connections, RJ45 Port Supports 10Gb/s, 5Gb/s, 2.5Gb/s, 1Gb/s and 100Mb/s Base-T, Compatible with TB 4 Host Ports
Blazing-fast: over 900MB/s real-world tested transfer speed for large file transfers, video editing, and live streaming gaming sessions
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
InfiniBand is a high-speed networking technology primarily used in data centers for low-latency, high-bandwidth communication between servers. Traditionally, it requires specialized hardware and infrastructure. Recent efforts have focused on soft-RoCE and other software-defined approaches to bring similar performance to commodity hardware. The current breakthrough builds on these ideas by repurposing consumer USB4/Thunderbolt ports, which are widespread in modern PCs, to emulate InfiniBand interfaces. This approach leverages the high bandwidth and low latency of USB4/Thunderbolt, previously underutilized for such applications.
“This is experimental research code, most of it AI-generated, and it loads experimental kernel modules on machines I was willing to crash repeatedly.”
— the developer
“We built experimental RDMA-over-USB4 for 128GB Strix Halo mini PCs, enabling fast communication for AI workloads across consumer hardware.”
— the developer
InfiniBand emulation USB4 Thunderbolt device
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
It is not yet clear how stable or scalable this approach is for long-term or production use. The implementation remains experimental, with potential issues related to system stability, compatibility, and hardware variability. For more insights on home office productivity, see Vietnam’s workers power Japan Inc but face AI risks at home. Further testing is needed to determine whether this can be reliably deployed outside of research settings.
![Cable Matters [Intel Certified] 40Gbps Thunderbolt 4 Cable 3.3ft with 8K Video and 240W Charging - 1m, Compatible with USB4, Thunderbolt 3 Cable and USB-C](https://m.media-amazon.com/images/I/41lpiJCsNCL._SL500_.jpg)
Cable Matters [Intel Certified] 40Gbps Thunderbolt 4 Cable 3.3ft with 8K Video and 240W Charging – 1m, Compatible with USB4, Thunderbolt 3 Cable and USB-C
Advanced Thunderbolt 4: The Cable Matters Thunderbolt 4 cable combines the newest Thunderbolt cable and USB 4 cable…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
Next steps include refining the kernel modules for stability, testing across a wider range of hardware, and exploring integration with existing AI frameworks. To explore related AI hardware topics, visit our AI hardware guide. Researchers and hobbyists may attempt to replicate or extend this work, while developers may work toward more robust implementations. The developer plans to continue experimenting and sharing updates as progress is made.

GEEKOM A9 Max High AI Productivity Mini PC,AMD Ryzen AI 9 HX 370(80 Tops)|DDR5|1TB SSD+32GB RAM|Copilot+ PC|Win 11 Pro|WiFi 7|BT 5.4|USB4.0|HDMI 2.1|8K Video Editing|for Business&Gaming&3D Rendering
𝗗𝗲𝘀𝗸𝘁𝗼𝗽-𝗖𝗹𝗮𝘀𝘀 𝗔𝗜 𝗣𝗼𝘄𝗲𝗿 𝗳𝗼𝗿 𝗡𝗲𝘅𝘁-𝗚𝗲𝗻 𝗪𝗼𝗿𝗸𝗳𝗹𝗼𝘄𝘀 – Powered by AMD Ryzen AI 9 HX 370 with up to…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
Can this be used in production now?
No, this is experimental research code not suitable for production environments. It is intended for testing and development purposes only.
What hardware is required to replicate this?
At minimum, AMD mini PCs with USB4 or Thunderbolt ports and the ability to load custom Linux kernel modules are needed. The developer used 128GB Strix Halo mini PCs for testing.
How does this compare to traditional Ethernet or Wi-Fi for AI workloads?
This approach achieves significantly higher data transfer rates (~95 Gb/s) and lower latency (~7 µs) compared to Ethernet (~2.3 Gb/s) or Wi-Fi, enabling faster distributed AI inference and training.
Is this compatible with existing AI frameworks?
Currently, the implementation is experimental and not integrated with mainstream AI frameworks. Future work may involve developing user-friendly interfaces or APIs for easier adoption.
Source: Hacker News