FugueSQL – SQL for Pandas, Spark and Dask
In this session we will be introduce FugueSQL, a language that allows heavy SQL users to work on Python-based DataFrames.
FugueSQL allows users to express computation workflows with a SQL-like interface. This code is then parsed and executed on any of the Pandas, Spark, and Dask engines. To provide a rich grammar for handling data, FugueSQL has enhancements over ANSI SQL such as keywords specific to distributed computing (PERSIST, BROADCAST, PARTITION), as well as support for custom Python functions. These allow users to leverage distributed computing, while using a language that they are already familiar with.
About the Speakers
Kevin Kho is an Open Source Community Engineer at Prefect, an open-source workflow orchestration management system. Previously, he was a data scientist at Paylocity, where he worked on adding machine learning features to their Human Capital Management (HCM) Suite. Outside of work, he is a contributor for Fugue, which is one of the SQL interfaces for Dask. He also organizes the Orlando Machine Learning and Data Science Meetup.
Rowan Molony has just started as an energy analysis software developer at Mainstream Renewable Power where he helps maintain their data platform. Before Mainstream, he was a data scientist at Codema – Dublin’s Energy Agency, where he introduced open code and open data to enable the team to create reproducible energy systems models.