TL;DR
A developer has created a Linux kernel module that allows consumer-grade USB4/Thunderbolt ports to function as InfiniBand devices. This enables high-speed RDMA communication between home computers, potentially transforming AI workloads without enterprise gear.
A developer has built an experimental Linux kernel module that enables ordinary USB4 and Thunderbolt ports on AMD mini PCs to emulate InfiniBand devices, achieving high-speed RDMA communication suitable for AI workloads at home. This breakthrough could allow consumers to perform tensor-parallel inference and distributed training without enterprise networking gear, marking a significant step in democratizing high-performance computing.
The project involves creating a Linux kernel module that tricks USB4/Thunderbolt ports into acting as InfiniBand interfaces, enabling RDMA-over-USB4. The developer reports achieving bidirectional data transfer rates of approximately 95 Gb/s with latency around 7 microseconds, comparable to enterprise-grade InfiniBand networks. Tests include running large AI inference models and FSDP training steps across two consumer mini PCs, with performance surpassing traditional Ethernet and soft-RoCE setups. The implementation is experimental, built for research purposes, and involves loading custom kernel modules that may cause system instability. The developer emphasizes that this is not production software and lacks official support.
Why It Matters
This development matters because it could dramatically lower the cost and complexity of deploying high-performance AI workloads at home or in small labs. Smart home security cameras could benefit from similar high-speed networking innovations. By enabling consumer hardware to emulate InfiniBand, it opens the possibility for hobbyists, researchers, and small organizations to perform tensor-parallel inference and distributed training without expensive enterprise networking gear. If further developed and stabilized, it could influence the future of distributed AI computing, making high-speed interconnects more accessible.

OWC Thunderbolt 4 10G Ethernet Adapter, for High-Speed Network Connections, RJ45 Port Supports 10Gb/s, 5Gb/s, 2.5Gb/s, 1Gb/s and 100Mb/s Base-T, Compatible with TB 4 Host Ports
Blazing-fast: over 900MB/s real-world tested transfer speed for large file transfers, video editing, and live streaming gaming sessions
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
InfiniBand is a high-speed networking technology primarily used in data centers for low-latency, high-bandwidth communication between servers. Traditionally, it requires specialized hardware and infrastructure. Recent efforts have focused on soft-RoCE and other software-defined approaches to bring similar performance to commodity hardware. The current breakthrough builds on these ideas by repurposing consumer USB4/Thunderbolt ports, which are widespread in modern PCs, to emulate InfiniBand interfaces. This approach leverages the high bandwidth and low latency of USB4/Thunderbolt, previously underutilized for such applications.
“This is experimental research code, most of it AI-generated, and it loads experimental kernel modules on machines I was willing to crash repeatedly.”
— the developer
“We built experimental RDMA-over-USB4 for 128GB Strix Halo mini PCs, enabling fast communication for AI workloads across consumer hardware.”
— the developer
InfiniBand emulation USB4 Thunderbolt device
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
It is not yet clear how stable or scalable this approach is for long-term or production use. The implementation remains experimental, with potential issues related to system stability, compatibility, and hardware variability. For more insights on home office productivity, see Vietnam’s workers power Japan Inc but face AI risks at home. Further testing is needed to determine whether this can be reliably deployed outside of research settings.
RDMA over USB4 hardware
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
Next steps include refining the kernel modules for stability, testing across a wider range of hardware, and exploring integration with existing AI frameworks. To explore related AI hardware topics, visit our AI hardware guide. Researchers and hobbyists may attempt to replicate or extend this work, while developers may work toward more robust implementations. The developer plans to continue experimenting and sharing updates as progress is made.

BOSGAME P4 Ultra Mini PC Gaming, Ryzen 7 7730U, 16GB RAM 1TB NVMe SSD Mini Computers, Dual 2.5G LAN, Wi-Fi 6E, BT5.2, 4K Triple Display HDMI | DP | Type-C, Home Office Business
【FAST Ryzen 7 PERFORMANCE】 – Powered by AMD Ryzen 7 7730U (8 Cores/16 Threads, up to 4.5GHz Turbo),…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
Can this be used in production now?
No, this is experimental research code not suitable for production environments. It is intended for testing and development purposes only.
What hardware is required to replicate this?
At minimum, AMD mini PCs with USB4 or Thunderbolt ports and the ability to load custom Linux kernel modules are needed. The developer used 128GB Strix Halo mini PCs for testing.
How does this compare to traditional Ethernet or Wi-Fi for AI workloads?
This approach achieves significantly higher data transfer rates (~95 Gb/s) and lower latency (~7 µs) compared to Ethernet (~2.3 Gb/s) or Wi-Fi, enabling faster distributed AI inference and training.
Is this compatible with existing AI frameworks?
Currently, the implementation is experimental and not integrated with mainstream AI frameworks. Future work may involve developing user-friendly interfaces or APIs for easier adoption.
Source: Hacker News