Research seminar | Informatika Doktori Iskola

The research seminar takes place Tuesdays between 15:00 and 17:00 PM in Room 6 (Basement, Árpád tér 2). The talks are about 30 minutes long. There is also time for questions and after 16:30 PM for free discussions as well. Some snacks and refreshments will be served (bring your mug, if possible).

Everyone is welcome!

Research seminar 2025
Date	Time	Speaker	Title
2025.03.18.	15:00	Bálint Mohos	Biases and Patterns in Predictions of Semantic Segmentation Models
2025.03.18.	15:45	András Czégel	Violation-Driven Quantum-Based Algorithms for Binary Linear Programs
2025.04.08.	15:00	Máté Vass	Bridging Community Detection and Influence Maximization: A Generalized Approach
2025.04.08.	15:45	Zenab Bosheah	Imaging-Based Nail Fungus Classification
2025.05.06.	15:00	Noel Nagy	Topology-Preserving Fully Parallel Curve-Thinning on FCC Grid
2025.05.13.	15:00	Sylvert Prian Tahalea	Modularity Aware Label Propagation Community Detection
2025.05.13.	15:45	Lea Hergert	Detecting Noisy Labels Using Early Stopped Models
2025.05.20.	15:00	Zita Amstadt	Object-Based Camera Pose Estimation from a Single Object Detection and Gravity Vector
	15:45	Péter Hajnal	Extending Static Analysis with Symbolic Execution and a SAT Solver
	16:30	Norbert Vándor	Creating a Database of Manually Validated Reproducible Python Bugs

Research seminar 2024
Date	Time	Speaker	Title
2024.04.09.	15:00	Gábor Karai	Sufficient conditions for topology-preserving parallel reductions on the face-centered cubic grid
2024.04.09.	15:45	Zoltán Domokos	Quantitative Analysis of Artificial Validation Sets for Fourier Domain CT Reconstruction
2024.04.16.	15:00	András Balogh	Towards more interpretable similarity measures for neural network representations
2024.04.16.	15:45	Valdez Solis Wilson Fernando	Exploring the Synergy of Fog Computing, Blockchain, and Federated Learning for IoT Applications. A Systematic Literature Review
2024.04.23.	15:00	Klaudia Szabó Ledenyi	Automatic processing of spinal MRI reports
2024.04.23.	15:45	Gabriele Tazza	Metabolic-Informed Neural Network for Multi-Omics Data Integration
2024.05.14.	15:00	Nóra Büki	Approximation algorithms for the Triangle Scheduling problem
	15:45	Zsuzsanna Szabó	Automatic processing of spinal MRI images
	16:30	Al-Najjar Ammar Fayez Ali	Adversarial Training without Hard Labels

Research seminar 2023
Date	Time	Speaker	Title
2023.02.28.	15:00	Péter Kardos	Are these descriptions referring to the same entity or just to a similar one?
2023.02.28.	15:45	Peter Juma Ochieng	WECALM: A Special Structural Based Weighted Network Approach for the Analysis of Protein Complexes
2023.03.07.	15:00	Péter Soha	A novel approach of FAIR guideline for research tools
2023.03.07.	15:45	Dominik Hirling	Down the Rabbit Hole: Segmentation Metric Misinterpretations in Bioimage Analysis
2023.03.14.	15:00	Dario Ruggeri	A better approach for reproducible research in unexplored research fields
2023.03.14.	15:45	Tamás Viszkok	Improving vulnerability prediction with Transfer Learning
2023.03.21.	15:00	Zoltán Ságodi	Who needs tests? Not me
2023.03.21.	15:45	Márk Lajkó	Automated Program Repair with the GPT Family, including GPT-2, GPT-3 and CodeX
2023.03.28.	15:00	Daniella Bársony	OpenGL API Call Trace Reduction with the Minimizing Delta Debugging Algorithm
2023.03.28.	15:45	Balázs Laczi	Determining the axes of rotation of the lower jaw
2023.04.04.	15:00	Tamás Ficsor	Accelerating Transformer Inference Time through Layer Stitching and Reduction
2023.04.18.	15:00	Emília Heinc	Optimization of Combinatorical systems and Processes, development of algorithms
2023.04.18.	15:45	Zoltán Domokos	Fourier Domain CT Reconstruction with Complex Valued Neural Networks
2023.04.25.	15:00	Mihály Gencsi	The Fritz-John condition system in Interval Branch and Bound method
2023.04.25.	15:45	Ulvi Shakikhanli	Impact of branching strategies to productivity in Mono/Multi Repository Structures
2023.05.02.	15:00	Madjeda Touatit	CURVED line Detector/Descriptor
2023.05.02.	15:45	Ammar Al-Najjar	PCA improves the adversarial robustness of neural networks
2023.05.09.	15:00	Andor Tönköly	Fluctuation Enhanced Gas Sensing
2023.05.09.	15:45	Bálint Maczák	General characterisation of human activity and comparison of its determination methods

Bálint Mohos

Biases and Patterns in Predictions of Semantic Segmentation Models

In recent years, deep learning has reached previously unprecedented heights and yet, the adversarial robustness question is still a relevant and unsolved problem that puzzles the minds of researchers. Although, classification tasks are well studied and some slow progress has been made, the task of semantic segmentation is somewhat neglected despite the fact that the output of these models are packed with valuable information that can give us clues about how different training methods and architectures can impact their “behaviours”. The studying of these prediction patterns can give us hints about what models to use for specific problems and may just give us insights about our currently state of the art models that are claimed to have some adversarial robustness.

András Czégel

Violation-Driven Quantum-Based Algorithms for Binary Linear Programs

Quantum optimization is one of the most advanced fields of quantum science and technology. While there are many algorithms for binary linear programming and quadratic optimization problems, most of them are automated quantum processes or parameterized quantum circuit-based solutions, trying to solve the whole problem at once. On the other side, modern integer linear programming solvers consist many different exact and heuristic algorithms to tackle these problems, by each of them doing its own part in the process. We provide a new approach by assembling ideas from both worlds: wrapping a quantum optimization algorithm into a classical framework, that performs constraint generation on the physical level, modifying directly the dynamics of the quantum system. Thus, the framework can utilize the probabilistic and noisy nature of quantum computation to drive the algorithmic process towards optimality with improved performance guarantee over any quantum optimization algorithm it wraps.

Máté Vass

Bridging Community Detection and Influence Maximization: A Generalized Approach

In recent years, many papers have focused on community detection and influence spread in social networks. The influence maximization problem involves determining a set of nodes from which the maximum expected value of influenced nodes is generated through a given influence process. Our intuition was that in social networks, entities with important community roles are crucial for solving this problem. However, most of the existing approaches focus on specific models and their unique attributes. Our goal was to design an algorithm that is generalized in a way that it is independent of the specifics of these models. We have developed a general community detection method that can use any influence model as its input, and based on the found communities, it can narrow down the search space of the influence maximization problem. We only select nodes from the best candidates sorted by their community roles and try to maximize the expected final influence value. With our proof of concept approach, we show the efficiency of our algorithm on artificial and real-life benchmark graphs for the popular Independent Cascade, Linear Threshold and Only-Listen-Once influence models.

Zenab Bosheah

Imaging-Based Nail Fungus Classification

The research project focused on developing an advanced image classification system for nail condition diagnostics, exploring the impact of different imaging modalities and color variations on detection accuracy. The project implemented a novel ensemble approach using binary classifiers based on the ConvNeXt architecture, training them on multiple image variations including standard RGB images, polarized light images, and various color channel modifications. The system's architecture employed three specialized binary classifiers, each trained and tested across different image presentations to understand how various lighting conditions and color modifications affect classification performance. The system employs three specialized binary classifiers, each optimized for a specific class (fungus, normal, and other categories), with ensemble voting enhanced by confidence-based quality assessment. The methodology incorporates five distinct model variants trained with different seeds, implementing weighted voting mechanisms and adaptive thresholds specific to each class. A key innovation is the quality-based decision system that separates predictions into high and low confidence categories, applying stricter voting criteria for low-confidence cases. The architecture leverages transfer learning from ImageNet, with custom classification heads and dropout regularization, while addressing class imbalance through weighted sampling. Comprehensive evaluation metrics track model agreement, prediction confidence, and ensemble diversity, enabling robust performance across varying image conditions. The results demonstrate the effectiveness of class-specific parameter optimization and quality-based decision making in improving classification reliability, particularly for challenging image categories.

Noel Nagy

Topology-Preserving Fully Parallel Curve-Thinning on FCC Grid

In this talk we first introduce the basic concepts of the topic, like topology preserving, simple points, non-conventional grids, and thinning. Then, we introduce some former skeletonization methods with examples, as well as our own approach and results.

Sylvert Prian Tahalea

Modularity Aware Label Propagation Community Detection

Community detection is a problem in network science that aiming to partition complex network into groups of densely connected subgraph, which can uncover the underlying structure of a network. Community detection problem is closely linked to modularity maximization which is a measure to evaluate the quality of the network after the community assignment. The community detection algorithms face inherent trade-offs between maximizing modularity and minimizing the computational runtime. This research aims to provide alternative solution to that problem.

Lea Hergert

Detecting Noisy Labels Using Early Stopped Models

The problem of identifying samples with noisy labels is important in machine learning, as these samples can have devastating effects on both the training process and the trained models. Using the predictions of a well-generalizing model to flag incorrectly predicted labels as noisy is a known method, but it is not considered competitive. At the same time, it has been observed recently that gradient descent fits clean samples first, and noisy samples are memorized later. Inspired by related theoretical results, we revisit the idea of using the predictions of a model to classify samples as noisy or clean. We improve this approach by using an early stopped model and employing various ensemble methods. Motivated by recent theoretical results on clean priority learning, we also introduce a novel approach that classifies samples based on the per-sample gradient of the loss. Through only observing a standard training run and collecting checkpoints, our methods are still competitive with state-of-the-art approaches.

Zita Amstadt

Object-Based Camera Pose Estimation from a Single Object Detection and Gravity Vector

Pose estimation from object detections often suffers from ambiguity, especially when using standard bounding boxes that lack orientation information. Recent works use ellipsoid-ellipse correspondences for camera pose estimation, but face inherent ambiguity due to symmetry and require at least two correspondences to constrain rotation. We propose a novel robust camera pose estimation pipeline obtaining a minimal solution from a single object when the vertical direction and the object’s orientation to it are known. This allows outlier filtering, followed by a closed-form least squares solution using multiple inlier objects. Our method takes advantage of Directional Object Bounding Box (DOBB), which detects both object direction and its minimal enclosing box, providing essential geometric information for pose estimation.

Péter Hajnal

Extending Static Analysis with Symbolic Execution and a SAT solver

Symbolic execution is a variant of static analysis in which the algorithm simulates the program’s runtime behavior based on static analysis, making it capable of detecting runtime errors. However, this comes at a high cost, as it requires examining all possible execution paths, which is nearly impossible for larger projects. Therefore, various heuristics have been developed to accelerate this process, even at the cost of potentially missing some bugs. The task within the framework of this research topic is to investigate how existing complex symbolic algorithms based on static analysis can be further improved.

Norbert Vándor

Creating a Database of Manually Validated Reproducible Python Bugs

While there are a few databases that contain Python bugs, only a subset of them contain ones that are reproducible. Thus we created PyBugHive, a manually curated database of 149 reproducible Python bugs, taken from 11 open-source projects. The entries contain the summary of the bug report, the corresponding patch, and the test cases that expose the given bug. We also provide a command line interface for checking out and testing both the buggy and the fixed versions of the projects. As a use case, we also demonstrated the bug detection and bug fixing capabilities of ChatGPT-3.5, finding that while it performs surprisingly well on the former task, it is almost incapable of the latter - at least on this particular dataset.

Gábor Karai

Sufficient conditions for topology-preserving parallel reductions on the face-centered cubic grid

Topology preservation is a crucial issue in parallel reductions that transform binary pictures by changing only a set of black points to white at a time. We present three sufficient conditions for topology-preserving parallel reductions on the three types of pictures of the unconventional 3D face-centered cubic (FCC) grid. One of them provides methods of verifying that a given parallel reduction always preserves the topology. The remaining ones directly provide deletion rules of topology-preserving parallel reductions, and make us possible to generate topologically correct thinning algorithms. Furthermore, we construct a kernel-thinning algorithm from our last sufficient condition.

Zoltán Domokos

Quantitative Analysis of Artificial Validation Sets for Fourier Domain CT Reconstruction

In computed tomography and several related scientific domains, the Fourier slice theorem is a powerful mathematical tool to solve the problem of image reconstruction. Although this theorem is well understood in the continuous case, a detailed quantitative analysis of artifacts caused by discretization is rarely found in the computed tomographic literature. Assuming a practical Fourier Domain Reconstruction (FDR) algorithm, which performs resampling by interpolation or approximation in the frequency domain, artifacts have two main sources. One of these is a combination of truncation and aliasing, introduced by Discrete Fourier Transform (DFT), while the other is the numerical error of the function estimation algorithm that performs resampling. Here, we provide an algebraic method to quantitatively isolate distinct sources of error and construct a set of novel metrics that can be used in the numerical analysis of reconstruction methods.

András Balogh

Towards more interpretable similarity measures for neural network representations

Similarity measures can provide various insights about the inner workings of deep neural networks. Recently, the analysis of neural networks' internal representations has gained considerable attention. A new line of research aims to compare internal representations in a functional sense, that is, whether a receiver network can achieve its function with the representations of a different source network. Model stitching, the prevalent method for this type of analysis, aligns the two models' representations with a trainable affine stitching layer. In our work, we demonstrate the unreliability of stitching as a measure of similarity by showing that it fools the receiver with misaligned (out-of-distribution) or collapsed representations. We also provide insights into potential ways to fix this issue by treating the alignment of representations as an optimal transport problem.

Valdez Solis Wilson Fernando

Exploring the Synergy of Fog Computing, Blockchain, and Federated Learning for IoT Applications. A Systematic Literature Review

The rise of Internet of Things (IoT) applications presents significant challenges in handling data processing, privacy, and security. To address these issues, promising technologies like Fog Computing (FC), Blockchain (BC), and Federated Learning (FL) have emerged. Combining these technologies can expand their capabilities but also introduces new challenges. This study have conducted a thorough review of existing literature to explore how these technologies are integrated within the IoT domain. Through analyzing 40 papers against 38 criteria, focusing on technical aspects specific to FC, BC, FL, or their integration, the research evaluates the current state-of-the-art. The findings shed light on the benefits, obstacles, opportunities, and limitations of this integration, particularly concerning data processing, privacy, and security in IoT. By bridging a research gap and examining FC, BC, and FL interoperability across different architectural layers, this study contributes to advancing knowledge in the field. It introduces a new framework for implementing FL and BC within FC environments for IoT applications, accompanied by a thorough synthesis of existing literature, setting it apart from previous studies. Additionally, it provides valuable insights into the current landscape, identifies areas for further research, and suggests future research directions. The proposed framework and literature review offer tailored information on FC-BC-FL integration, assisting in the development and deployment of robust IoT solutions.

Klaudia Szabó Ledenyi

Automatic processing of spinal MRI reports

A significant number of radiologic reports are created each year. Textual reports are typically in unstructured form, which contains a vast amount of information. This could be harnessed with appropriate information extraction techniques that have undergone rapid improvements in the near past. In our current work, we present our steps toward the machine understanding of clinical reports of the spinal region. Our system provides automatic classification, connection detection, handling negation, phrase normalization, severity scoring and other processing steps. The output of the process provides valuable statistics, visual training data, and visualization of the reports, and it could also contribute to report processing applications.

Gabriele Tazza

Metabolic-Informed Neural Network for Multi-Omics Data Integration

Understanding cellular behavior requires knowledge of both metabolism and its regulation. While omics data provides clues about cellular activity, integrating it with traditional models (GEMs) can be challenging. Conversely, machine learning (ML) models learn directly from data but lack a mechanistic foundation. Recently, a new framework to embed a GEM in a neural network has been proposed, opening the door for developing of hybrid models that combine the strengths of mechanistic and data-driven approaches. In this work, we present a novel Metabolic-Informed Neural Network (MINN) that integrates also multi-omics data as inputs. We apply this MINN to predict metabolic fluxes in Escherichia coli under different growth rates and gene knockouts. We demonstrate that, while the measured fluxes lie outside the feasible space of the GEM, incorporating this information helps prevent over-fitting, a common challenge in ML with limited data.

Nóra Büki

Approximation algorithms for the Triangle Scheduling problem

Triangle Scheduling (TS) is a special case model of off-line mixed-criticality scheduling: uniform processing time jobs can have different integer p priority levels. A job's priority the minimum amount of time a job must be allowed to extend if allowed to start at its scheduled s start time. Lower priority jobs cannot be started if a higher priority job is still running. In this model, jobs can be represented as equilateral right triangles "standing up" on the time axis, and the goal is to minimize the makespan, which is the rightmost point of the sequence of triangles (max pi+si). TS is proven to be NP-hard in the general case, but easy in certain linear-time recognizable ones. A polynomial time 1.5-approximation algorithm, Greedy, was described, and a lower bound of 1.05 was proven for it. Later, that bound was improved to 1.27. We proposed another approximation algorithm, Bintree, that builds upon the definition and Greedy's handling of the easy cases of TS, and extends the set of known easy cases. Bintree has an approximation ratio between 1.35 and 2ln(2) ~ 1.386, and it is asymptotically faster than Greedy. Furthermore, we discovered that Greedy and Bintree handle each other's lower bound constructions optimally, which makes it likely that running the two together and picking the shorter schedule is a valid approximation algorithm that is strictly better than Greedy or Bintree separately.

Zsuzsanna Szabó

Automatic processing of spinal MRI images

For spinal disorder diagnosis using Magnetic Resonance Imaging (MRI) is a standard setup. However, accurate segmentation of spinal objects still poses a challenge, due to the inter-class similarity, low representation of highly deformed objects in the data, which are also important for classification, as well as the expensiveness of acquiring data. In this work, I present the steps to segment vertebra and disc data, detect missing vertebra in case of failed segmentation or multiple vertebra shown as one, match the correct labels to each object, and detect disc disorders. My research contributes to a report processing application, which helps the workflow of doctors and assists the understanding of the diagnosis for the patient.

Al-Najjar Ammar Fayez Ali

Adversarial Training without Hard Labels

Adversarial training is widely used to enhance classifier robustness. Several improvements have been proposed including different forms of distillation and self-alignment. Here, we propose a novel loss function combining these two approaches, while not using the hard ground truth labels directly. Our new loss function is demonstrated to simultaneously improve both the robustness and the accuracy of some well-known competing solutions. This is a step towards combatting the robustness accuracy tradeoff, a crucial issue in adversarial training. Our method also reduces the variance of the accuracy over the classes in the experimental scenarios we examined, leading to a more balanced model.

Péter Kardos

Are these descriptions referring to the same entity or just to a similar one?

Recent years Language Models ruled the scene of NLP however little research went into how well they represent similar terms. The main focus of my presentation is to differentiate two concepts with the same meaning from two that are just similar to each other. I took the task of graph matching where given two graphs you should find the pairs that represent the same concept and built a multi-step system that can propose pairs with the help of Language Models resulting in a best performing system on 2 out of 5 datasets of the OAEI Knowledge Graph matching track.

Peter Juma Ochieng

WECALM : A Special Structural Based Weighted Network Approach for the Analysis of Protein Complexes

The detection and analysis of protein complexes is essential for understanding the functional mechanism and cellular integrity. Recently, several techniques for detecting and analysing protein complexes from Protein-Protein Interaction (PPI) dataset have been developed. Most of those techniques are inefficient in terms of detecting, overlapping complexes, exclusion of attachment protein in complex core, inability to detect inherent structures of underlying complexes, have high false-positive rates and an enrichment analysis. To address these limitations, we introduce a special structural-based weighted network approach for the analysis of protein complexes based on a Weighted Edge, Core-Attachment and a Local Modularity structures (WECALM) implemented in the six steps. First we construct the PPI network with a weighted edge approach using Jaccard coefficient similarity. Second, we identify the overlapping proteins by the average node degree and betweenness values of immediate neighbor. Third, we identify a local structural modularity by modularity score function. Fourth, we identify the core protein complex using structural similarities of the seed protein and its immediate neighbors. Fifth we provide an efficient method to detect protein complex by appending attachment proteins to the detected core protein complexes. Finlay, in the sixth step we calculate the p-value to validate the biological significance of the detected protein complexes by a functional enrichment analysis. Our simulation results indicate that WECALM outperforms existing algorithms in terms of accuracy, computational time, and p-value. A Functional enrichment analysis also shows that WECALM is able to identify a large number of biologically significant protein complexes. Overall, our WECALM outperforms eight other approaches by striking a better balance of accuracy and efficiency in the detection of protein complexes.

Keywords: Protein complexes; Core-attachment; Local modularity structure; Weighted PPI network

Péter Soha

A novel approach of FAIR guideline for research tools

Program Slicing is one of the most important uses of source code analysis. Over the years, different variations of it have been implemented in many tools, depending on the goal to be achieved. However, what these programs have in common is that they are not intended for publication, and their subsequent use may be problematic. In this paper, we present a set of principles that builds on the FAIR guidelines and describes the basics and common criteria for publishing research software in a more accurate and informative way.

Dominik Hirling

Down the Rabbit Hole: Segmentation Metric Misinterpretations in Bioimage Analysis

In today's scientific environment, with an increasing attention on AI solutions for imaging problems, a plethora of new image segmentation and object detection methods emerge. Thus, quantitative evaluation is crucial for an objective assessment of algorithms. Often, object detection and segmentation tasks utilize evaluation metrics with the same name, but a different meaning due to the differences between object-level and pixel-level classification or just because multiple interpretations coexist. One could argue that in most cases, the meaning should be clear from the context, however, specific and often non-detailed characteristics of the circumstances (e.g. small variations of the task) can make it hard for the readers to understand the exact meaning of different metrics. My presentation is focusing on the various interpretations that have emerged in the research communities related to some segmentation scores. As such, we could identify 5 different definitions for the “average precision (AP)”, and 6 different interpretations for the “mean average precision (mAP)” metrics in the literature. To make things even more complicated, even when some methods work with the same dataset, the metrics used for the evaluation of performance are not necessarily the same. The aims of my presentation are to shed light on some of the main issues with the current state of segmentation and object detection metrics, and to investigate the reasons for the ambiguous use of classification concepts. I’m also going to point out the problems of using similar metrics with nuanced differences by evaluating the 2018 Data Science Bowl (DSB) and 2021 Sartorius neuron segmentation challenge submissions with metrics of similar meaning but slightly differing interpretations.

Dario Ruggeri

A better approach for reproducible research in unexplored research fields

In the last years the hype for artificial intelligence grown incredibly, and in many research fields there was a will to adopt data driven, and more specifically deep learning methods. On one hand those new approaches can give an important advantage in those fields, on the other hand machine learning methods are non-trivial to use and deploy effectively, so it’s crucial to pay attention to some potential issues in cross domain publications; especially the issue of the reproducibility of the experiments and the reusability of the published code. In this presentation an analysis of the research branch focused on the application of deep learning models in the biomedical research field will be shown. Not all fields are equally developed when it comes to deep learning; indeed lot of attention is given to image processing and speech related tasks in general, but in other more specific fields, we are left with much less literature available; we can only find few, small, publicly available datasets, and the applied modeling techniques are usually considered to be obsolete.

Tamás Viszkok

Improving vulnerability prediction with Transfer Learning

One of the biggest obstacles in the way of successfully detecting vulnerabilities using deep neural networks is the relatively small amount of training data. Training these networks requires a lot of data in order to get the best result, therefore having insufficient amount of data results in poor performance. It is a well-known problem in the field of deep learning, hence there are multiple ways of overcoming it, one of them is called transfer learning. In this research, we are building a big pretraining dataset of sonarqube warnings to use it for transfer learning in order to improve our vulnerability prediction performance.

Zoltán Ságodi

Who needs tests? Not me

Call graphs are fundamental for many higher-level code analyses. The selection of the most appropriate call graph construction tool for an analysis is not always straightforward and depends on the purpose of the results’ further usage. The choice of call graph construction tool has a great effect on the following tasks’ execution time, memory usage, and result quality. This research compares the resulting static and dynamic Java call graphs to assist in the selection of the most appropriate tools. Static call graphs, as their name suggests, are constructed by static analysis, based on the source code or the bytecode, without executing tests or any code parts. This means that the project can be analyzed in its early stages and with fewer resources, but there is concern that this will result in less accurate, noisier graphs since the dynamic behavior of the programs will be estimated by static algorithms. Inaccuracies can greatly affect analyses based on call graphs. On the other hand, dynamic call graphs are created during the actual execution of the program. The calls that are included as edges in the graph are exactly those that were executed during the run, so you can expect the result to be more accurate. However, dynamic analysis requires more resources and the execution of code via test cases which provide high test coverage. In this work, we investigated the relationship between dynamic and static call graphs. Is the graph generated by dynamic analysis really better? Can static graphs approximate or even complement dynamic call graphs with sound results? In order to find the answers to these questions, we compared the results of five static and one dynamic analyzer. They were evaluated on three projects of different sizes and test coverage. We included in the comparison a merged graph created by ourselves by combining different static analyzer outputs. Not only did we compare static graphs to the dynamic results, we also validated the calls in a dynamic graph and found that these graphs could mislead the user. The results show that dynamic graphs should be considered good, although not a golden standard since they contain phantom calls, calls that are not present in the source code. Such calls are not limited to synthetic calls. Static analyzers could not be applied without consideration either, but a combination of static call graphs does tend to contain similar calls to the dynamic graphs with no phantom calls.

Márk Lajkó

Automated Program Repair with the GPT Family, including GPT-2, GPT-3 and CodeX

Automated Program Repair (APR) is a promising approach for addressing software defects and improving software reliability. There are various approaches to APR, including using Machine Learning (ML) techniques such as neural networks and evolutionary algorithms, as well as more traditional methods such as static analysis and symbolic execution. In recent years, there has been growing interest in using ML techniques for APR, including the use of large language models such as GPT-2 and GPT-3. These models have the ability to generate humanlike text and code, making them well-suited for tasks such as generating repair patches for defective programs. In this presentation, we explore the use of the GPT family (including GPT-2, GPTJ-6B, GPT-3 and Codex) for APR of JavaScript programs and evaluate their performance in terms of the number and quality of repair patches generated. Our results show that these state-of-the-art language models are able to generate repair patches that successfully fix the defects in the JavaScript programs, with Codex performing slightly better overall. To be precise, in our self-assembled dataset, Codex was able to generate 108 repair patches that are exactly the same as the developer fix for the first try. If we considered multiple patch generations, up to 201 buggy programs are being repaired automatically from the 1559 evaluation dataset (12.89%).

Daniella Bársony

OpenGL API Call Trace Reduction with the Minimizing Delta Debugging Algorithm

Debugging an application that uses a graphics API and faces a rendering error is a hard task even if we manage to record a trace of the API calls that lead to the error. Checking every call is nota feasible or scalable option, since there are potentially millions of calls in a recording. In this paper, we focus on the question of whether the number of API calls that need to be examined can be reduced by automatic techniques, and we describe how this can be achieved for the OpenGL API using the minimizing Delta Debugging algorithm. We present the results of an experiment on a real-life rendering issue, using a prototype implementation, showing a drastic reduction of the trace size (i.e., to less than 1‰ of the original number of calls) and positive impacts on the resource usage of the replay of the trace.

Balázs Laczi

Determining the axes of rotation of the lower jaw

Recording and reproducing mandibular movements have been of key importance in the practice of dentistry for over a century. Recently, it has become possible to use digital technologies for these tasks. In our study we have investigated the possibilities of 3D intra oral scanners and optimization based algorithms for this problem. This is a novel approach against of the most of the studies have been published where CBCT and live MRI were used mostly with the pure geometry rules based Reuleaux method. This is a cooperation with the Department of Oral and Maxillofacial Surgery from Albert Szent-Györgyi Medical School, SZTE.

Tamás Ficsor

Accelerating Transformer Inference Time through Layer Stitching and Reduction

Compact models are important in areas where low latency is necessary or training/evaluation is highly constrained by computational resources. There are already existing solutions that tackle the problem in different ways, such as quantization, pruning, or knowledge distillation. We assume that each layer contributes to the overall objective at different rates, and we try to identify and remove the less influential layers. We propose two approaches to improve the inference time in a task-specific aspect. Following previous studies, we propose a self-stitching mechanism which adds new weights to the model with the constraint that only these weights can be trained. The trained parameters are responsible for stitching the two ends of the same model while skipping several layers in-between. The other approach aims to discard highly specialized layers before fine-tuning, which can help the convergence of the model while reducing the overall inference time. We have coined the term 'DiscardBERT' to refer to this approach.

Emília Heinc

Optimization of Combinatorical systems and Processes, development of algorithms

The process of network synthesis is a commonly used method for decreasing material and energy consumption and mitigating negative environmental impacts, thereby increasing profitability. However, finding the optimal synthesis with minimal cost presents a challenge, as it is NP-hard. To address this challenge, a branch-and-bound algorithm is employed, taking advantage of the structural characteristics of the possible synthesis to reduce the solution space significantly. While this acceleration technique is beneficial, other aspects of the algorithm have not been extensively examined. In light of this, we propose a new lower bound that gives a tighter estimate than previous relaxed versions by taking the optimal structural aspects into account. This lower bound is particularly crucial when using non-linear or stochastic units in the synthesis, as it requires more time to calculate. Additionally, we extend our acceleration technique to address specific synthesis problems, such as the construction and operation of electric transmission networks, which involve undirected connections. We expand the unit model and adapt the MSG algorithm and neutral extension finding to accelerate the problem.

Zoltán Domokos

Fourier Domain CT Reconstruction with Complex Valued Neural Networks

In computed tomography, several well-known techniques exist that can reconstruct a cross section of an object from a finite set of its projections, the sinogram. This task – the numerical inversion of the Radon transform – is well understood, with state of the art algorithms mostly relying on back-projection. Even though back-projection has a significant computational burden compared to the family of direct Fourier reconstruction based methods, the latter class of algorithms is less popular due to the complications related to frequency space resampling. Moreover, interpolation errors in resampling in frequency domain can lead to artifacts in the reconstructed image. Here, we present a novel neural-network assisted reconstruction method, that intends to reconstruct the object in frequency space, while taking the well-understood Fourier slice theorem into account as well. In our case, the details of approximated resampling is learned by the network for peak performance. We show that with this method it is possible to achieve comparable, and in some cases better reconstruction quality than with another state of the art algorithm also working in frequency domain.

Mihály Gencsi

The Fritz-John condition system in Interval Branch and Bound method

The Interval Branch and Bound (IBB) method is a good choice when a rigorous solution is required. This method handles computational errors in calculations. Few IBB implementations use the Fritz-John (FJ) optimality conditions to eliminate non-optimal boxes in a constrained nonlinear programming problem. The FJ optimality condition effectively means a solution to an interval-valued system of equations. In the best case, the solution is an empty set if the interval box does not contain an optimum. In many cases, solving this system of equations fails. This problem can be caused by the fact that the interval box contains many solutions, or the defined system of equations contains unnecessary conditions, or the interval Gauss-Seidel method fails. These unsuccessful attempts have a negative outcome and only increase the computation time. In this talk, we propose four modifications to reduce the runtime and computational complexity of the Interval Branch and Bound method. In addition, we focus on a preliminary test that the Fritz-John system of equations is solved only if we are sure that a solution exists in the interval box. We describe a method for constructing a conic hull from the enclosure of the gradients in active constraints. The conic hull is used to determine whether each interval box contains an optimal solution. If the test is satisfied, we can solve the Fritz-John system of equations and reduce the interval box. Otherwise, we can discard the interval box because it does not contain an optimal solution. We present the effectiveness of the modifications and the preliminary test with experimental results.

Ulvi Shakikhanli

Impact of branching strategies to productivity in Mono/Multi Repository Structures

Productivity is the main aspect in the project development process and it has been analyzed from different aspects. However none of them has ever focused on branching strategies and repository structures. This paper analyzes more than 3 million Github repositories and creates a solid Database with over 50 000 chosen Mono/Multi repository projects. Based on this Database 3 main branching strategies have been defined among the most productive projects. Results of this paper have been grouped according to the team size, development environment and repository structure of projects.

Madjeda Touatit

CURVED line Detector/Descriptor

A detector descriptor pipeline has been proposed for detecting and matching curved lines in equirectangular images. The detector in the pipeline is a neural network that produces a heatmap indicating the presence of curved lines in the equirectangular images. The line extractor then extracts the line segments from the heatmap by thresholding and clustering the heatmap values. Once the line segments are extracted, the descriptor is based on extracting the line features from a patch of size 48x32. The matching of line pairs between two images is performed using a distance-based approach, where the distance calculation between the descriptors of the lines is used. The pose estimation is obtained by using a RANSAC-based algorithm, specifically the Cayley RANSAC, to estimate the camera pose. The proposed pipeline has been evaluated on the KITTI 360 dataset of equirectangular images.

Ammar Al-Najjar

PCA improves the adversarial robustness of neural networks

Deep neural networks perform well in many visual recognition tasks, but they are sensitive to adversarial input perturbation. More robust models can be learned when attacks are applied to the training data or preprocessing is used. However, the effect of preprocessing is frequently underestimated and it has not received sufficient attention as it usually does not affect the network’s clean accuracy. Here, we seek to demonstrate that preprocessing can play a role in improving adversarial robustness. Our empirical results show that principal component analysis, a simple yet effective preprocessing method, can significantly improve neural networks’ robustness for both regular and adversarial training.

Andor Tönköly

Fluctuation Enhanced Gas Sensing

Fluctuation Enhanced Sensing has been an active field of research for some years among those who study the applicability of noise. Our goal is to combine the principle with machine learning to make an application that is capable of odor detection. In our work, we examined the feasibility of a microcontroller-based, long-lifetime application mainly from the perspective of power consumption and reviewed the already available technologies. Our current research aims to detect different odors using multiple commercially available gas sensors by examining their output resistance signals in both the time-domain, and the frequency-domain.

Bálint Maczák

General characterisation of human activity and comparison of its determination methods

We participated in an extensive research project in cooperation with psychiatrists and biophysicists. The collaboration involved the measurement of raw acceleration signals on the non-dominant wrist of 42 healthy, free-living subjects over 10 days to measure their locomotor activity. The data acquisition had several interdisciplinary objectives. Firstly, we were interested in determining how activity signals could be quantified from the acceleration signals. Such actigraphic measurements are an important part of research in different disciplines, yet the procedure of determining activity values is unexpectedly not standardized in the literature. The acceleration data can be diversely preprocessed, and then the activity values can be calculated using various activity metrics. Therefore, several types of activity signals can be determined from the same recording. To resolve methodological inconsistencies, we executed a detailed and comprehensive comparison of the activity calculation procedures by assessing the relationship between the different types of activity signals derived from the previously mentioned dataset. The correlation pattern revealed that most activity metrics produce closely related activity signals from identically preprocessed acceleration recording, but in practice, the data preparation varies between manufacturers and methods. In the world of human dynamics analysis, the scale-free nature of temporal and spatial patterns is a recurring motif. This universality has already been identified by our research group in human location displacement data in the form of 1/f-type noise, which is a special form of power-law scaling in the frequency domain. The scale-free properties in human activity have also been identified through statistical analysis (e.g., distribution of passive periods), however, the description of human activity’s spectral characteristics was incomplete. To explore the general spectral nature of human activity in greater detail, we analyzed their fluctuations. We revealed that different types of activity signals’ spectrum generally follow a universal characteristic, including 1/f noise over frequencies above the circadian rhythmicity. Moreover, we discovered that the spectrum of the raw acceleration signal has this same characteristic, and therefore the scale-free nature is generally inherent to the motor activity of healthy, free-living humans.