Lengyel Balázs (MTA KRKT, Agglomeráció és Társadalmi Kapcsolathálózatok Lendület kutatócsoport vezető): Spatial diffusion and churn over the life-cycle of innovation: the role of social networks
Innovative ideas, products or services spread on social networks that, in the digital age, are maintained via telecommunication tools such as emails or social media. One of the last standing puzzles in social contagion is the role of physical space and it is not fully understood how products disappear from the map at the end of their life-cycle. In this paper, we utilize a unique dataset compiled from a Hungarian on-line social network (OSN) to uncover novel features in the spatial adoption and churn of digital technologies. The studied OSN was established in early 2000s and failed in international competition a decade later. A Bass Diffusion Model describes the process how the product gets adopted in the overall population. However, it does not cope with the prediction of spatial diffusion. The novel ingredients missing from the model are: the assortativity of adoption time, urban scaling of adoption over the product life-cycle and a distance decay function of diffusion probability. We find that early adopter towns also churn early; while individuals tend to follow the churn of nearby friends and are less influenced by the churn of distant contacts.
Gyimesi Péter (SZTE)
A szoftverhibák számos okból kifolyólag elkerülhetetlenek egy rendszer fejlesztése során: szűk határidők, pontatlan specifikáció, programozói figyelmetlenség, stb. Ezeknek a hibáknak a megtalálása és kijavítása erőforrás igényes feladat. Számos kutatás foglalkozik a szoftverhibák felderítésével és különböző megközelítéseket alkalmaznak. Egy azonban közös bennük: a módszereket valahogyan tesztelni, azok eredményességét mérni kell. Ebből a célból közzétettek nyilvános hiba-adatbázisokat, melyek benchmark-ként szolgálnak az ilyen jellegű kutatások számára. Az előadás során az ilyen hiba-adatbázisok előállítására mutatunk egy hatékony módszert, mely egy gráf adatbázist (Neo4j) használ.
Ivan Luković and Vladimir Ivančević (University of Novi Sad)
Ivan Luković: Formal Education in Data Science – A Perspective of Serbia and Faculty of Technical Sciences
In the last years, Data Science becomes an emerging education and research discipline all over the world. Software industry shows an increasing and even quite intensive interest for academic education in this area. The similar trend has been noticed in Serbia, particularly in Belgrade and Novi Sad. In this talk, we discuss main motivating factors for creating a new study program in data science at Faculty of Technical Sciences of University of Novi Sad. Also, we present a short survey of software industry needs for data science related experts, and discuss how we structured the new study program and addressed the main issues that come from more than evident industry requirements. The program was accredited in year 2015, both at the level of bachelor and master level studies, and this school year is its first execution, from which we expect the new experiences.
Vladimir Ivančević: An Overview of Selected Research Studies in Data Analytics and Data Science at the Faculty of Technical Sciences in Novi Sad, Serbia
Over the past several years, the Faculty of Technical Sciences in Novi Sad, Serbia, has been a home to an increasing number of research studies concerned with extraction of potentially valuable information from diverse data sets. Many of these research efforts were concentrated at the Chair of Applied Computer Science during a period in which data science started emerging as one of the most popular and promising new fields associated with computer science and informatics. The two most notable areas covered by these studies are education and medicine, both of which play an important role in the contemporary society and generate ample quantities of data. The education-focused studies dealt with a wide set of topics: a) exploring patterns in spatial distribution of students in a classroom, b) analysing student grades with respect to different factors, c) constructing programming tests automatically, and d) identifying position of engineering and technology education within the system of higher education in Serbia. The medicine-related studies centered upon epidemiology-oriented topics: a) creating a software system to support business intelligence in epidemiology and b) analysing collected data about early childhood caries to identify risk factors and create predictive models. Studies from both areas have provided some interesting findings and also managed to spur ideas for new research directions in theory and practice of data analytics and data science.
Sándor Szabó (University of Pécs): Estimating clique number in high performance computing environment
Suppose A is an algorithm to locate an upper estimate for the clique number of a given graph G. Using algorithm A one can construct a new A' algorithm that provides an improved estimate of the size of the maximum cliques in G. The fact that such algorithm A' exists does not come as a surprise to us. The point is that the construction is practical and the new algorithm A' well suited for various high performance computing environments. We carried out a large scale numerical experiment to test our proposal.
Kristóf Kovács (BME): Facility location on networks, with hard to compute objective functions
This talk will be about facility location problems on networks with objective functions that are especially hard to compute. Calculating any point of these functions requires the solution of an NP-hard problem. Due to this property general global optimizer algorithms are inefficient to solve these problems. I will introduce the general Stackelberg problem, where two or more firms compete for demand by locating facilities one after the other. The choice of location for one of the firms influences the choice of the other firms, as both wants to maximize its profit after the facilities are built. Another hard to solve problem I will talk about is the 1-median problem with demand surplus, where one facility has to be located such that only a given percent of the demand has to be covered. Finally I will present the solution and the computational results to a Stackelberg problem, as well as a modified 1-median problem, both of which have hard to compute objective functions.
Nagy Gyula (SZTE): Tudománymetriai és tartalmi elemzések szövegbányászati módszerekkel
Abstract: tudományos élet szereplői számára az olyan nemzetközi hivatkozási adatbázisok használata, mint a Web of Science vagy a Scopus egyre inkább ismertek és megkerülhetetlenek. Mindemellett azonban teljes tudományterületek léteznek hazánkban, amelyek ezen adatbázisokban csak kevéssé reprezentálódnak [Scival, Hungary 2011-2015: Social Sciences 4,4%; Arts and Humanities 2,7%; Other 7,7%]. A jelenség több okra vezethető vissza: egyrészt a "soft science" tudományágak publikálási szokásai jelentősen eltérnek a "hard science" diszciplínák publikálási gyakorlatától (ti. tanulmánykötet, monográfia, stb.), másrészt ezen területek hazai kutatói előszeretettel publikálnak olyan folyóiratokban, amelyeket a nemzetközi hivatkozási adatbázisok nem indexelnek. Azonban részükről ugyanolyan elemi igény mutatkozna saját tudományterületük belső szerkezetének felderítésére. Mindezt a kívánságot a tudományos együttműködés hálózatainak felderítése által és egy-egy folyóirat, vagy akár egy egész diszciplína hivatkozási gráfjának megalkotásával tudjuk teljesíteni. A fenti vizsgálódás elvégzéséhez - a bevett tudománymetriai elemzési módszerek mellett - egyaránt érdemes segítségül hívnunk a szövegbányászat és a hálózattudomány eszközeit. Az előadásban egy pilotprojekt tapasztalatai kerülnek bemutatásra, melynek keretében kísérletet tettünk két magyar nyelvű, társadalomtudományi folyóirat teljes tudománymetriai elemzésére. Kutatásunkban kiemelt figyelmet szenteltünk a társszerzőségi gráfok előállításának, illetve a folyóiratok teljes hivatkozási hálózatának megalkotására, mely a magas elemszám miatt csak a hivatkozások automatikus extrakciója által volt megvalósítható. A tudománymetriai elemzések mellett az egyik tárgyalt folyóirat esetében egy összetett tartalmi elemzésre is kísérletet tettünk a szövegbányászat megoldásain keresztül.
István Hegedűs (University of Szeged): Gossip-Based Machine Learning and Matrix Decomposition
Abstract: We are talking about distributed learning, when we wish to build models or compute aggregations on data sets that are stored on a large set of computers. Standard methods for processing distributed data first collect them in data centers and run centralized algorithms. But models can be built on the data sets without centralized algorithms as well. In my presentation, I will talk about the distributed data aggregation and modeling, introduce the basics of machine learning and classification. Finally, I will present a gossip-based solution of the distributed learning and an algorithm that can be used for low-rank matrix decomposition.
Péter Erdős (Rényi Institute): Degree Sequences' Realizations
Abstract: This talk can be considered as a prequel to István Miklós' talk on March 23, 2017, but it is completely self contained. It discusses realization and listing problems of different degree sequence type questions. These are fundamental tools for studying synthetic graphs which simulate real life networks.
István Miklós (Rényi Institute): Exact sampling of graphs with prescribed degree correlations
Abstract: Many real-world networks exhibit correlations between the node degrees. For instance, in social networks nodes tend to connect to nodes of similar degree. Conversely, in biological and technological networks, high-degree nodes tend to be linked with low-degree nodes. Degree correlations also affect the dynamics of processes supported by a network structure, such as the spread of opinions or epidemics. The proper modelling of these systems, i.e., without uncontrolled biases, requires the sampling of networks with a specified set of constraints. We present a solution to the sampling problem when the constraints imposed are the degree correlations. Furthermore we would like to give a brief survey of our current knowledge of generating such random networks.
Federico Musciotto(University of Palermo and CEU): Timely evolution and core of communities of statistically validated projections of bipartite networks
Abstract: Many complex systems are naturally organized in bipartite networks, i.e. networks with two disjoint sets of node, usually characterized by high heterogeneity. Although useful information can be extracted from these networks, their projections are not always stable and robust against errors in the data or other sources of noise. A solution to this issue is based on statistically validated networks, which reduce the original system by considering only the links which are statistically significant. In this way, cores of the real communities of the system are obtained with a high level of precision. Moreover, the methodology of SVN can be extended in order to track the evolution in time of the communities of a system. In this talk I will show the results of these methods on a dataset of stock market Finnish investors, characterized by high heterogeneity with respect to trading activity. This work is the result of a collaboration with Luca Marotta, Jyrki Piilo and Rosario N. Mantegna.
András Bóta (The University of New South Wales, Australia): Learning infection processes
Abstract: Infection models can be used to model the spread of disease, information, behaviour and many other things through a network composed of connected nodes. One of the common challenges arising in the application of infection models is the lack of available transmission probabilities. The task of inverse infection is the systematic estimation of these values. Several methods have been proposed recently for solving this task. A common property of these approaches is that they make many specialized assumptions about the problem they are trying to solve. In contrast the method we discuss in this talk gives a general framework for inverse infection tasks and offers much needed flexibility allowing the method to be used with a variety of real-life applications. In the second part of the talk we are going to discuss a specific application of the generalized inverse infection model. Modern transportation infrastructure allows quick and efficient travel between distant parts of the world, but unfortunately also offers many ways to transfer diseases between regions. Recent examples of global outbreaks include the MERS, SARS and swine flu epidemics and most recently the 2015 Zika fever outbreak of Brazil. A critical component of outbreak control is the identification of disease spreading from region to region. We will show how the proposed inverse infection model can be used to 1. accurately model the Zika virus outbreak on the country level, 2. estimate travel risk between regions.
Brys Zoltán (LAM, BME-TMIT): Hálózatkutatás a népegészségügy és a gyógyszeripari marketing területén. Esetismertetés és áttekintés. A gyógyszerfelírások, a pattanások és fejfájás valóban "szociálisan fertőznek"?
Kivonat: A gráfelmélet a fertőző betegségek terjedésének modellezésében régóta használt módszer és fontos eredményeket tárt fel (Epstein, Balcan, Lilrejos stb.). A fehérje-fehérje interakciós hálózat elemzése pedig a bioscience szerves részévé vált (genetika, epigenetika, gyógyszerfejlesztés stb.) A képalkotó eljárások és az optogenetika terjedésével a neurológiában is egyre inkább használt a gráfelmélet. A népegészségügy területén nagy vitát váltott ki Christakis és Fowler 2007-es felvetése, amely szerint a noncommunicable diseases egyik jelentős okaként számon tartott elhízás és dohányzás "szociálisan fertőz". A felvetés jelentős kritikát kapott, főként statisztikai (Lyons) és szociálpszichológiai/szociológiai (Buda) területen. Cohen-Cole és mtsai (Yale) szellemes közleményében hasonló módszertannal bizonyította, hogy Christakisék logikájával hálózatos hatások érvényesülnek a pattanások és fejfájások terjedésében is. A hálózatelemzés ígérete, hogy egyfajta "theory of everything", mindent magyarázó "csodamódszer" lesz nem látszik beigazolódni. Sajnos az egyes hálózatelemzéssel dolgozó sztárszerzők a "csodamódszer" birtokában könnyen és észrevétlenül (és biztosan nem tudatosan) negligálták a társadalomtudományokban és a társtudományokban felhalmozott tudásvagyont (pl. szociológia, addiktológia stb.). Az előadó egy valós gyógyszeripari Clinical Opion Leader detektálási eset ismertetése kapcsán próbálja bemutatni, hogy a népegészségügyhöz hasonló folyamat látszik a gyakorlati tudományok területén, különösen a marketing az adatelemzés területére. Az előadó személyes meglátása szerint a hálózatelemzés hype-fázisa lassan lejár és a módszer valódi értéke lassan kibontakozni, ami jelentősen kisebb, mint az ígéret volt, de így is rendkívül jelentős.
Ivan Fekete(Semmelweis University and Link-group): Computational prediction personalized of drug combinations
Abstract: Numerous methods and biological networks have been used to model intracellular signal transduction, but so far the clinical applicability of these approaches remained rather limited. Here we describe a novel system (Turbine, http://turbine.hu) for the reliable simulation of intracellular signaling which required, first, to utilize a large, fully dynamically reviewed, manually curated network of major human signaling pathways and their transcriptional regulatory mechanisms, and second, to run ensembles of simulations and extract the resulting attractors - steady states - of the system. The software finds the attractors of the signaling network, and correlates the attractors’ activity patterns with the activity of biological processes, like apoptosis or proliferation. By combining a large set of steady states, we were able to map the cellular attractor landscapes, which varied depending on the presence of different physiological ligands, available membrane receptors, or - most importantly regarding the clinical applicability - mutated proteins. This approach, combined with additional omics data layers (such as cancer genomic and transcriptomic profiles) and artificial intelligence systems made Turbine able to predict potential mono- or combination drug therapies on a personalized basis.
Laszlo Kovacs (University of West Hungary, Department of Applied Lingusitics): Linguistic networks
Abstract: Network structures are not new in linguistics: since the 1960’s networks are used to explain linguistic phenomenon. In the first part of the lecture a broad picture of linguistic network research will be given, showing where networked structures in language exist. The main focus of the lecture is on the networked structures of the mind: it will be shown on the example of the Hungarian database "ConnectYourMind" which structures in the mental lexicon ("dictionary of mind") exist. The presented results are joint works done with Andras Bota, Laszlo Hajdu and Miklos Kresz (University of Szeged) and with Peter Pollner and Katalin Orosz (Eotvos University).
Gyorgy Turan (University of Szeged and University of Illinois at Chicago): Betweenness centrality
Abstract: The betweenness centrality of a vertex in a network measures the number of shortest paths containing the vertex. The local version considers only shortest paths of bounded length. We review related results, and then discuss the behavior of the local version for trees, including worst-case and scale-free random trees. Joint work with Ben Fish and Rahul Kushwaha.
Nandor Poka, (research assistant (Department of Applied Informatics) and doctoral candidate (Doctoral School of Multidisciplinary Medical Science) University of Szeged): Combinatorial Scientific Computing and (Computational) Biology: When the demand meets the offer
Abstract: With the rise of the BigData and High-Performance Computing era, came the flourishing of many fields of science, including Combinatorial Scientific Computing. Both the business and scientific community produces such vast amounts of data, that new technologies and algorithms are required to analyze them. To efficiently utilize highly parallel computers or clusters, tasks must be decomposed and the data must be partitioned, and these involve graph algorithms themselves. Combinatorial Scientific Computing methods have been used for eg. in the aforementioned load distribution, automatic differentiation, statistical physics, and other enigmatic areas. However more “life-like” applications of these methods were and are put to use in other fields of science like modern biology. In our talk we will present various biological problems (both “old” and more recent) that can be fairly easily represented with mathematical objects such as graphs and matrices. Our main focus of these problems will be the present and future challenges of Next-Generation Sequencing, and how to tackle them using Data Science and Combinatorial Scientific Computing.
Miklos Kresz (Department of Applied Informatics, University of Szeged; joint work with Andras Bota and Andras Pluhar): Dynamic network mining and business intelligence
Abstract: During the last decade social network analysis and mining became a key research area. Apart from the obvious applications in on-line social network services, the field plays central role in many classical business intelligence tasks such as customer attrition, risk analysis and campaign management. In order to capture the characteristics of the above problems, the dynamics of the corresponding network processes and that of the changes in the network structure needs to be studied. In this talk we will consider two relevant problems. Dynamic community detection is an algorithmic tool for the analysis of the lifetime of communities in real graphs. The study of infection processes in networks pose several algorithmic and modelling questions such as maximizing the spread of influence or approximating the real infection values. In addition to review applied models and methods, in the talk real-life applications will be also presented.
Bogdan Zavalnij(University of Pecs): Mathematical modeling of various problems by graphs and solving them with clique search
Abstract: In our talk we would like to demonstrate the modeling powers of graphs. We will demonstrate problems from different fields and show that they can be reformulated as graphs. The solution of these problems will be transformed to usual graph problems as different coloring and clique problems. The problems included in the talk will be from simple games, scheduling, stock exchange, drug design and even graph problems themselves -- including hypergraph coloring problems. We would like to show some useful techniques and also some common methods for such reformulations that could prove useful.
Christian Bongirono (University of Palermo): The dual-projection approach for bipartite networks using statistical link validation
Abstract: Bipartite complex networks are usually analyzed by projecting the two disjoint set of nodes into two networks and then using the standard techniques for them separately. Since complex systems are often very heterogeneous that makes very difficult to distinguish links of the projected network that are just reflecting system’s heterogeneity from links relevant to unveil the properties of the system. To avoid this problem it has been developed a methodology for one-mode projections of bipartite networks using an unsupervised statistical link validation. In order to study the efficiency of the method we investigate the community structure of the projected network using both a simple projection and the statistically validated projection on various synthetic benchmarks and real networks. In all these cases the link validating filtering procedure necessarily increases the precision and suggested to use, even if considering the drawback that it decreases the level of accuracy in certain situations.
Christian Bongirono (University of Palermo): Statistical Regularity in the Air Traffic flow
Abstract: The aircraft trajectories are compounded by a sequence of spatial fixed points (NVP) that typically diverge from the best path route. This infrastructure allows the air traffic controllers to direct the air traffic flow on standard air ways, and focuses their attention to a few numbers of special NVPs where the routes converge. As a drawback the not optimal routes force the air traffic controller to modify, where is possible, the routes to enhance the air traffic flow. The aim of the first part of the talk is to highlight the behaviour of the air traffic controller respect to a network optimization operation named direct by the observation of stylized facts both at the global trajectory level and at the local navigation point level. In the second part of the talk will be discussed how an Agent Based Model could help us in understanding the transition from the current NVP based network to the future new SESAR scenario, where the aircraft will be allowed to follow a free-route path.
Gabor Berend (Institute of Informatics): Learning the transition matrix for weighted PageRank
Abstract: The PageRank algorithm is a widely known and well understood approach to assign importance scores to nodes in networks. This highly applicable approach relies on the assumption that all the connections of a node has equal importance (i.e. inversely proportional to their out-degree). The talk will introduce approaches that aim at determining the strength of the connections between pairs of nodes given that the PageRank scores (or some relative importance scores) of the nodes within the graph is assumed to be known in advance. After presenting these models, both quantitative and qualitative results will be presented on synthetic and real world networks (e.g. citation and collaboration networks, the English and Hungarian Wikipedia and networks generated from language usage).
Gabor London (MTA-SZTE Stereochemistry Research Group): Accomplishments and challenges of networks science in chemistry
Abstract: Approaching chemical problems with the tools of network science has proved very fruitful in recent years. The network approach has provided us with new ways of designing drugs, optimizing multistep chemical syntheses or fighting terrorism. Most of these successes, however, are based on the analyses of available data. One of the big challenges chemists are facing whether it is possible to implement this data-based knowledge for the creation of instructable molecular networks that can be models for early (molecular) evolution or able to perform complex (synthetic) tasks. The talk will discuss both the data-based achievements and the recent attempts towards creating “molecular ecosystems”.
Istvan Kiss (SZTE Knowledge Management Research Center): Communities and central nodes in the network of the mobile inventors in the United States
Abstract: IP-intensive industries accounted for about 33-40% of the gross domestic product in western economies. These intellectual properties are mostly embodied in patents and trademarks. Highest stake of the expenditures in the mentioned sector goes for the wages of white-collar workers what underlines the importance of the intellectual capital and knowledge in the creation and development of IP portfolio. We investigated the flow of knowledge as a crucial resource among organizations by analysis of patent documents from the United States. In our network organizations are the nodes and the mobility of researchers among them are the edges. This graph can be considered as the informal innovation network of US organizations, where firms, universities or governmental institutions competing for knowledge and recombine their innovation capacity through the mobility of inventors.
Tamas Vinko (Institute of Informatics): Network models for BitTorrent communities
Abstract: BItTorrent communities are content sharing systems using peer-to-peer technology. From mathematical point of view these communities can be modeled with graphs of particular structure. For example, there is a straightforward bipartite representation, where we take the users and the files as vertices and the edges represent supply and demand. Using this simple representation one can already consider interesting optimization problems. Moreover, there is a richer graph representation which leads to a flow network. Hence, one can start thinking about applying traditional flow algorithms and their meaning in this particular context.
Andras London (Institute of Informatics): Statistically validated projections of bipartite networks
Abstract: Bipartite networks naturally appear in from social to biological systems, examples include, among many others, the actors--movies network, artists--music network, scientists--research papers cooperation network, network of sexual contacts, diseases--genes network, plants--pollinators mutualistic networks, banks--firms money transfer networks and words co-occurrence networks. Many properties of these networks typically investigated by constructing and analyzing a projected network on one of the two sets of the original network. When one constructs a projected network of nodes only from one set, the original network's heterogeneity (e.g. the heterogeneous degree distribution) makes difficult to determine those links that presence in the projected network cannot be explained by random co-occurrence of their neighbors in the original network. In this talk we present a method based on statistical validation to overcome this problem and show some possible applications on real bipartite systems.