April 16, 2023 - Coimbra, Portugal
6th Workshop on Hot Topics in Cloud Computing Performance (HotCloudPerf 2024)
at ICPE 2023
Overview
The HotCloudPerf workshop proposes a meeting venue for academics and practitioners, from experts to trainees, in the field of cloud computing performance. The new understanding of cloud computing covers the full computational continuum from data centers to edge resources to IoT sensors and devices. The workshop aims to engage this community and to lead to the development of new methodological aspects for gaining a deeper understanding not only of cloud performance, but also of cloud operation and behavior, through diverse quantitative evaluation tools, including benchmarks, metrics, and workload generators. The workshop focuses on novel cloud properties such as elasticity, performance isolation, dependability, and other non-functional system properties, in addition to classical performance-related metrics such as response time, throughput, scalability, and efficiency.
Acknowledgement
The HotCloudPerf workshop is technically sponsored by the Standard Performance Evaluation Corporation (SPEC)’s Research Group (RG) and is organized annually by the RG Cloud Group. HotCloudPerf has emerged from the series of yearly meetings organized by the RG Cloud Group, since 2013. The RG Cloud Group group is taking a broad approach, relevant for both academia and industry, to cloud benchmarking, quantitative evaluation, and experimental analysis.
Proceeding
The wokshop proceeding is now available in the ICPE Companion proceedings.
Program
All times in GMT.
09:00 Opening
Session 1: Understanding and Explaining the Cloud
09:10 Keynote 1: “Using Cloud Native Technologies to Understand the Performance of Cloud Native Technologies” by Cristian Klein
10:10 Floriment Klinaku, Sandro Speth, Markus Zilch and Steffen Becker. Hitchhiker’s Guide for Explainability in Autoscaling.
10:30 Morning coffee break
Session 2: Machine Learning and Microservices
11:00 Xiaoyu Chu, Sacheendra Talluri, Alexandru Iosup and Laurens Versluis. How Do ML Jobs Fail in Datacenters? Analysis of a Long-Term Dataset from a HPC Cluster.
11:20 Francesc-Josep Lordan Gomis, André Martin and Daniele Lezzi. Securing the Execution of ML Workflows across the Compute Continua.
11:40 Jessica Leone and Luca Traini. Enhancing Trace Visualizations for Microservices Performance Analysis.
12:00 George Kousiouris and Aristidemos Pnevmatikakis. Performance experiences from running an E-health inference process as FaaS across diverse clusters.
12:20 Lunch break
Session 3: Metaverse
14:00 Keynote 2: “Scaling the Metaverse - an AI perspective” byTania Lorido Botran
15:00 Matthijs Jansen, Jesse Donkervliet, Animesh Trivedi and Alexandru Iosup. Can My WiFi Handle the Metaverse? A Performance Evaluation Of Meta’s Flagship Virtual Reality Hardware.
15:20 Afternoon coffee break
Session 4:
15:45 Keynote 3: “Four Hot Topics in Cloud Computing Performance in Klagenfurt” by Radu Prodan
16:45 Computing in the Cloud Continuum: Technological challenges, killer applications and future trends
Joint panel with the FastContinuum workshop.
17:30 End of HotCloudPerf 2023
Keynotes
Cristian Klein: Using Cloud Native Technologies to Understand the Performance of Cloud Native Technologies
Umeå University, Sweden
Description: The ambition of this talk is to help seed discussions around how cloud native technologies can help research on performance engineering, but also what are the interesting performance engineering challenges to solve with cloud native technologies.
Cloud native technologies are building blocks for creating a modern environment for hosting containerized applications. Amongst others, great focus is placed on observability, which allows engineers to collect and analyze massive amounts of performance data in near real-time. Take as an example service meshes, which are a layer 7 network platform for containerized applications. Service meshes not only allow traffic engineering, but also add observability on top of a microservice application. Amongst other, this allows understanding traffic patterns between microservices, including upstream-downstream relationships, request rate, etc. without writing a single line of code.
In this talk, I will discuss how cloud native technologies may help researchers in performance engineering. The benefits are three-fold. They allow researchers – e.g., PhD students – to be more productive, by getting the mechanism of collecting performance data out of the way. They improve collaboration because the effects of changing a parameter can be visualized in near real time. Finally, experiments are based on proven technologies with skills more widely available, which helps reproducibility.
I will illustrate these benefits through our research on adaptive service meshes. Indeed, service meshes have many parameters which impact application performance. Discussions with practitioners revealed a gap in understanding on how to effectively choose these parameters. We therefore proposed an adaptive controller that configures a service mesh so as to maintain a target tail response time.
Bio: Cristian Klein is a cloud architect at Elastisys and an adjunct lecturer at Umeå University. His role involves looking at data protection regulations and security best practices to make architectural decisions. He gathered over 18 years of experience in operating IT systems. He acted variously as researcher, teacher, consultant and practitioner. His research interests include cloud native technologies, information security and service meshes.
Tania Lorido Botran: Scaling the Metaverse - an AI perspective
Roblox, USA
Description: When one hears the word Metaverse, it is automatically associated with millions of users, immersive experiences and its potential to change our lives. But, what enables the Metaverse to function at such a scale? This talk will present the different challenges associated with handling 55 million daily users within the Roblox Metaverse. From addressing user Quality of Experience, to distributed architecture, programming model and scheduling, we will cover the entire stack underneath. In particular, I will put special attention into the AI-Metaverse relationship. On one hand, a large proportion of workloads are based on one or multiple ML/DL models and I will present the challenges of scaling them. On the other hand, I will explore infrastructure and service model challenges that can be addressed with AI, e.g. multi-resource and multi-datacenter scheduling with Reinforcement learning.
Bio: Dr. Tania Lorido-Botran is a Research Scientist at Roblox. Prior to that, she worked at Microsoft and the Pacific Northwest National Laboratory. During her PhD, she had the opportunity to spend one year at Rice University and also did two internships at VMware and HP Labs. Dr. Lorido Botran received her PhD from the University of Deusto in Spain with a Cum Laude distinction, and her master’s degree in Distributed systems from University of the Basque Country with a highest marks distinction. Her current research interests span across ML for systems, data center sustainability and fault-tolerance.
Radu Prodan: Four Hot Topics in Cloud Computing Performance in Klagenfurt
University of Klagenfurt, Austria
Description:The presentation discusses four hot Cloud computing topics researched at the University of Klagenfurt:
- Social media as today’s largest and most popular front-end application worldwide;
- Fine-grained simulation of backend serverless functions workflows on commercial clouds;
- Scheduling of workflow applications on the computing continuum assuring service level agreements;
- Sustainable processing of the massive graph representation of extreme data generated on the Internet.
Bio: Radu Prodan is a professor in distributed systems at ITEC, University of Klagenfurt. Austria. He received his Ph.D. in 2004 from the Vienna University of Technology and was an associate professor until 2018 at the University of Innsbruck, Austria. He is interested in performance, optimization, and resource management tools for parallel and distributed systems. He participated in numerous projects and coordinated, among others, the Horizon 2020 project ARTICONF. He coauthored over 200 publications and received three IEEE best paper awards.
Topics
Empirical performance studies in cloud computing environments, applications, and systems, including observation, measurement, and surveys.
Performance analysis using modeling and queueing theory for cloud environments, applications, and systems.
Simulation-based studies for all aspects of cloud computing performance.
Operational techniques for self-organization, resource management, and scheduling in cloud environments, e.g. service meshes, auto-scaling, auto-tiering.
End-to-end performance engineering for pipelines and workflows in cloud environments, or of applications with non-trivial SLAs.
Tools for monitoring and studying cloud computing performance.
General and specific methods and methodologies for understanding and engineering cloud performance.
Methodological and practical aspects of software engineering, performance engineering, and computer systems related to hot topics in cloud performance, e.g. serverless, microservices, non Von Neumann architectures, virtualization/containerization.
Case studies on cloud performance and its interaction with the computing continuum, including benchmarking, exploratory studies, dataset collection and negative results.
Sustainability and energy-efficiency in cloud computing environments, applications, and systems.
Network, storage and accelerators in the computing continuum.
Important Dates
January 17 January 24, 2023
January 24, 2023
February 27 February 13, 2023
March 6 February 20, 2023
April 16, 2023
Abstract due (updated)
Papers due (hard deadline, no extension)
Author Notification
Camera-ready deadline
Workshop day
Submission Types
Full-papers (6 pages including tables and figures but not references and appendices)
Short-papers (3 pages including tables and figures but not references and appendices)
Talk only (1-2 pages, not included in the proceedings).
Format
The format of the submissions is single-blind and should follow the ACM format of the companion conference, ICPE.
All presented papers will have a good amount of time allocated for Q&A plus feedback. In addition, the presentation session will be wrapped up by a 10-15 min discussion.
Are you concerned about the quality of the presentation of your paper (i.e. whether it satisfies the standards for its structure, style of writing, etc...)? Please contact us at least two weeks prior to the submission deadline and we will provide feedback to you.
Instructions for Authors from ACM
By submitting your article to an ACM Publication, you are hereby acknowledging that you and your co-authors are subject to all ACM Publications Policies, including ACM’s new Publications Policy on Research Involving Human Participants and Subjects. Alleged violations of this policy or any ACM Publications Policy will be investigated by ACM and may result in a full retraction of your paper, in addition to other potential penalties, as per ACM Publications Policy.
Please ensure that you and your co-authors obtain an ORCID ID, so you can complete the publishing process for your accepted paper. ACM has been involved in ORCID from the start and we have recently made a commitment to collect ORCID IDs from all of our published authors. The collection process has started and will roll out as a requirement throughout 2022. We are committed to improve author discoverability, ensure proper attribution and contribute to ongoing community efforts around name normalization; your ORCID ID will help in these efforts.
Submission Site
Articles and talk-only contributions are required to be submitted via EasyChair. The track corresponding to HotCloudPerf2023 should be selected.
Call for Papers
You can find the full Call for Papers (CfP) here: CfP
Organizing Committee
Klervie Toczé, Linköping University, Sweden, (klervie.tocze@liu.se)
Cristina L. Abad, Escuela Superior Politécnica del Litoral, Ecuador, (cabadr@espol.edu.ec)
Nikolas Herbst, University of Würzburg, Germany, (nikolas.herbst@uni-wuerzburg.de)
Alexandru Iosup, VU Amsterdam, the Netherlands (a.iosup@vu.nl)
Program Committee
Cristina Abad, Escuela Superior Politecnica del Litoral
Auday Al-Dulaimy, Mälardalen University
Ahmed Ali-Eldin, Chalmers University of Technology
Atakan Aral, University of Vienna
Marta Beltran, Universidad Rey Juan Carlos
Andre Bondi, Software Performance and Scalability Consulting LLC
Marc Brooker, Amazon Web Services
Wilhelm Hasselbring, University of Kiel
Nikolas Herbst, University of Würzburg
Alexandru Iosup, Vrije Universiteit Amsterdam
Dragi Kimovski, University of Klagenfurt
Tania Lorido, Roblox
Satadru Pan, Meta
Riccardo Pinciroli, Gran Sasso Science Institute
Issam Rais, The Artic University of Norway
Prateek Sharma, Indiana University Bloomington
Sacheendra Talluri, Vrije Universiteit Amsterdam
Petr Tůma, Charles University
André van Hoorn, University of Hamburg
Chen Wang, IBM