Marco D’Ambros is the director of CodeLounge, the R&D Center of the Software Institute at Università della Svizzera italiana. CodeLounge blends academic knowledge and industry experience to pursue both fundamental and applied research in software engineering. Marco earned his PhD in 2010, specializing in mining software repositories. After that, he joined Palantir Technologies, a leading Silicon Valley data analytics & AI firm, where he helped government agencies and major enterprises analyze large-scale, fragmented data, leading technical project execution worldwide. Since returning to academia, Marco has led numerous research and development projects at CodeLounge. His work spans several domains including artificial intelligence, natural language processing, software quality assessment, and software engineering. Marco’s contributions to the research community have earned him international recognition. Notably, he received the Distinguished Paper Award at ESEC/FSE 2022 (International Conference on the Foundations of Software Engineering), and the Most Influential Paper Award at MSR 2020 (International Conference on Mining Software Repositories).
Data analytics doesn't have to be slow or cumbersome. With Polars, a next-generation DataFrame library built for performance and simplicity, you can process massive datasets faster than ever. In this hands-on workshop, you will discover how to leverage Polars to support your data workflows from the fundamentals to the more advanced analytics scenarios.
In the first part of the workshop, we will introduce Polars as the modern library for data analytics in Python, covering its data model, engine design, and how it compares to other analytics tools like Pandas, which is effective for smaller datasets, and Apache Spark, which is horizontally scalable but inefficient for smaller datasets because of the overhead of the distributed infrastructure.
The second part will dive into a real-world dataset of NYC taxi trips, containing hundreds of millions of rows. We will walk through efficient data ingestion and cleaning, and then explore how to perform lightning-fast queries and groupings directly in a notebook environment. You will also learn how to visualize your findings with simple yet powerful plotting techniques.
Finally, we will tackle more complex and realistic analytics scenarios, such as identifying the top-earning drivers over a specific time window, and visualizing their most profitable trips and neighborhoods. Through these scenarios, we will showcase how Polars empowers you to move seamlessly from raw data to meaningful insights.
Searching for speaker images...