Biography


Roger
I'm an Applied Deep Learning Research Scientist at NVIDIA where I work on efficient large language model architectures, training, and inference.

I received my Ph.D. in computer science from the University of Wisconsin-Madison under the supervision of Prof. Theodoros (Theo) Rekatsinas (now at Apple). I also worked closely with Prof. Shivaram Venkataraman. My research focused on the intersection of systems and algorithmic challenges for IO-aware and resource-efficient training of large-scale ML models. In particular, my dissertation studied Algorithms and Systems for Scalable Machine Learning over Graphs.

Before that, I was an undergraduate at UW-Madison where I studied Applied Mathematics, Engineering, and Physics (AMEP). I worked with Prof. Cary B. Forest at the Wisconsin Plasma Physics Laboratory.


Selected Publications


Language Models
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models.
NVIDIA
2025.
An Empirical Study of Mamba-based Language Models.
Waleffe, R., Byeon, W., Riach, D., Norick, B., Korthikanti, V., Dao, T., Gu, A., Hatamizadeh, A., Singh, S., Narayanan, D., Kulshreshtha, G., Singh, V., Casper, J., Kautz, J., Shoeybi, M., Catanzaro, B.
2024.
Graphs
Armada: Memory-Efficient Distributed Training of Large-Scale Graph Neural Networks.
Waleffe, R., Sarda, D., Mohoney, J., Vlatakis-Gkaragkounis, E., Rekatsinas, T., Venkataraman, S.
2025.
MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks.
Waleffe, R., Mohoney, J., Rekatsinas, T., Venkataraman, S.
18th European Conference on Computer Systems (EuroSys ’23). 2023.
Demo of Marius: A System for Large-scale Graph Embeddings.
Xie, A., Carlsson, A., Mohoney, J., Waleffe, R., Peters, S., Rekatsinas, T., Venkataraman, S.
Proceedings of the VLDB Endowment, 14(12). 2021.
Marius: Learning Massive Graph Embeddings on a Single Machine.
Mohoney, J., Waleffe, R., Xu, Y., Rekatsinas, T., Venkataraman, S.
15th Symposium on Operating Systems Design and Implementation. 2021.
Other CS
Chameleon: a Heterogeneous and Disaggregated Accelerator System
for Retrieval-Augmented Language Models.
Jiang, W., Zeller, M., Waleffe, R., Hoefler, T., Alonso, G.
Proceedings of the VLDB Endowment, 18(1). 2024.
Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning.
Okanovic, P.*, Waleffe, R.*, Mageirakos, V., Nikolakakis, K. E., Karbasi, A., Kalogerias, D., Gürel, N. M., Rekatsinas, T.
The Twelfth International Conference on Learning Representations. 2024.
*Equal contribution.
Principal Component Networks:
Utilizing Low-Rank Activation Structure to Reduce Parameters Early in Training.
Waleffe, R., Rekatsinas, T.
ACM/IMS Journal of Data Science. 2023.

Education


Ph.D. - Computer Science
University of Wisconsin-Madison
Sep. 2019 - Dec. 2024
Overall GPA: 4.00/4.00
M.S. - Computer Science
University of Wisconsin-Madison
Sep. 2019 - May 2022
Overall GPA: 4.00/4.00
B.S. - Applied Mathematics, Engineering, and Physics (AMEP)
B.S. - Computer Science
University of Wisconsin-Madison
Sep. 2015 - May 2019
Overall GPA: 4.00/4.00

Selected Experience & Awards


UW-Madison CS Department Graduate Research Fellowship, 2019
Goldwater Scholarship, 2018
Software Development Engineer - Intern
Amazon
Summer 2018
Undergraduate Researcher and Engineer
Wisconsin Plasma Physics Lab at UW-Madison
Jan. 2016 - May 2018