Article Index

4 Conclusions

The tests and integration concepts presented here are based on a "first look" at the hardware. More investigation is needed with additional hardware. It is, however, possible to draw some initial conclusions.
  1. Composing systems for HPC seems to work. There does not seem to be a loss of performance with the addition of GPU based resources.
  2. Integration with existing resource schedulers (e.g., Slurm) seems possible, however, more work is needed to create a production ready environment. This "masquerade" approach lets users think about machines and not configuration when running jobs.
  3. In terms of using a scheduler to configure the PCIe fabric, more investigation into safe switch reconfiguration is needed. E.g., making sure that a new PCIe configuration does not change any other node's PCIe configuration while it is running. This PoC did not address this issue.
  4. While rebooting servers does work, server boot times can be annoyingly long. In addition, some sites prefer to not reboot servers unless absolutely necessary. This may limit some of the methods explored here. It is expected when a rapid and standard PCIe bus rescan is available, this will remove the need to reboot systems and make scripts like the Slurm suspend and resume much more efficient.

The author would like to thank GigaIO Networks for the use and assistance of their hardware. The Beowulf Foundation mission is to foster and support advanced technical computing (HPC, Data Analytics, Artificial Intelligence, etc.) through commodity and open source-driven innovation and open collaboration.

All Software is available by contacting the author and is expected to be on Beowulf Foundation Github by time of publication.

References

  1. CXL - Compute Express Link<.a>
  2. Slurrm Page
  3. Slurm Power Control

You have no rights to post comments

Search

Login And Newsletter

Create an account to access exclusive content, comment on articles, and receive our newsletters.

Feedburner


This work is licensed under CC BY-NC-SA 4.0

©2005-2023 Copyright Seagrove LLC, Some rights reserved. Except where otherwise noted, this site is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International. The Cluster Monkey Logo and Monkey Character are Trademarks of Seagrove LLC.