Home
I am an Economics PhD Candidate at the Department of Economics and Business at Central European University. I’m supervised by Adam Szeidl. I especially enjoy constructing efficient data pipelines to understand patterns in otherwise difficult-to-handle, large data sets.
I received my MA in Economics at CEU and my BA in Applied Economics at Corvinus University of Budapest. I also have a pre-degree certificate from the Business and Management BA program of Corvinus University of Budapest.
I am also a research assistant at the CEU Microdata Research Group since 2015 where I mostly work on data engineering tasks.
In 2022 I joined Spreadmonitor as a Senior Data Scientist advising the Data Science Team on multiple predictive models related to energy markets.
Projects
The Shapley Value in Machine Learning
Over the last few years, the Shapley value, a solution concept from cooperative game theory, has found numerous applications in machine learning. In this paper, we first discuss fundamental concepts of cooperative game theory and axiomatic properties of the Shapley value. Then we give an overview of the most important applications of the Shapley value in machine learning: feature selection, explainability, multi-agent reinforcement learning, ensemble pruning, and data valuation. We examine the most crucial limitations of the Shapley value and point out directions for future research.
A preprint of our paper is available on ArXiv. The published paper is available on IJCAI’s site
Pytorch Geometric Temporal
PyTorch Geometric Temporal is a deep learning framework combining state-of-the-art machine learning algorithms for neural spatiotemporal signal processing. The main goal of the library is to make temporal geometric deep learning available for researchers and machine learning practitioners in a unified easy-to-use framework. PyTorch Geometric Temporal was created with foundations on existing libraries in the PyTorch eco-system, streamlined neural network layer definitions, temporal snapshot generators for batching, and integrated benchmark datasets. These features are illustrated with a tutorial-like case study. Experiments demonstrate the predictive performance of the models implemented in the library on real world problems such as epidemiological forecasting, ride-hail demand prediction and web-traffic management. Our sensitivity analysis of runtime shows that the framework can potentially operate on web-scale datasets with rich temporal features and spatial structure.
The GitHub repository of the project is available here.
A preprint of our corresponding paper is available on ArXiv. The published paper is available in the ACM DL
Chickenpox cases - Hungary
Recurrent graph convolutional neural networks are highly effective machine learning techniques for spatiotemporal signal processing. Newly proposed graph neural network architectures are repetitively evaluated on standard tasks such as traffic or weather forecasting. In this paper, we propose the Chickenpox Cases in Hungary dataset as a new dataset for comparing graph neural network architectures. Our time series analysis and forecasting experiments demonstrate that the Chickenpox Cases in Hungary dataset is adequate for comparing the predictive performance and forecasting capabilities of novel recurrent graph neural network architectures.
The dataset can be accessed here.
A preprint our corresponding paper is available on ArXiv and has been accepted to the Workshop on Graph Learning Benchmarks (GLB 2021) at The Web Conference 2021
Karate Club
Karate Club is an unsupervised machine learning extension library for NetworkX. It builds on other open source linear algebra, machine learning, and graph signal processing libraries such as Numpy, Scipy, Gensim, PyGSP, and Scikit-Learn. Karate Club consists of state-of-the-art methods to do unsupervised learning on graph structured data. To put it simply it is a Swiss Army knife for small-scale graph mining research. First, it provides network embedding techniques at the node and graph level. Second, it includes a variety of overlapping and non-overlapping commmunity detection methods. Implemented methods cover a wide range of network science (NetSci, Complenet), data mining (ICDM, CIKM, KDD), artificial intelligence (AAAI, IJCAI) and machine learning (NeurIPS, ICML, ICLR) conferences, workshops, and pieces from prominent journals.
The GitHub repository of the project is available here.
A preprint of our corresponding paper is available on ArXiv. The published paper is available in the ACM DL
Little Ball of Fur
Little Ball of Fur consists of methods to do sampling of graph structured data. First, it includes a large variety of vertex, edge and expansions sampling techniques. Second, it provides a unified application public interface which makes the application of sampling algorithms trivial for end-users. Implemented methods cover a wide range of networking (Networking, INFOCOM, SIGCOMM) and data mining (KDD, TKDD, ICDE) conferences, workshops, and pieces from prominent journals.
The GitHub repository of the project is available here.
A preprint of our corresponding paper is available on ArXiv. The published paper is available in the ACM DL
Managers in Politics
With Arieda Muço (CEU), we aim to provide a better understanding of how managers intertwine with local politics.