Workshop Program

 

3:00 – 4:00 Professor Xiaoru Yuan (Peking University) “Urban Data Visualization”

Understanding the complex nature of activities in modern metropolitan regions are difficult due to the vast amount of data required for processing and analysis. Visualization provides essential accesses for users to comprehend such big data and gain insights, which is crucial for decision makers, political figures, as well as the general public. This talk will discuss visualization cases covering various types of urban data, including taxi GPS data, vehicle RFID data, subway IC card data, and social media data, and demonstrate how different data sets can be integrated for advanced visual analysis. With the assistant of properly designed visualization and interaction, both general pubic and experts can interactively conduct the data exploration, mental image construction, and insight discovery.

4:00 – 5:00 Professor Jaegul Choo (Korea University) “Joining Forces from Data Mining and Visual Analytics for Large-scale High-dimensional Data”

Visual analytics, which leverages human exploration via interactive visualization in data analyses, has recently gained popularity. Data mining methods often play a crucial role in visual analytics by providing an important insight about data. In this talk, I will present both fundamental approaches and visual analytics systems that join forces from data mining and visual analytics.

First, I will introduce a visual analytics system called the FODAVA testbed where users can obtain crucial understanding of data by applying various data mining methods in an interactive visual manner. Second, I will present another system called UTOPIAN (User-driven Topic Modeling based on Interactive Nonnegative Matrix Factorization), which re-designs computational methods for supporting various user interactions with real-time responses in visual analytics. Several usage scenarios of UTOPIAN will be presented using real-world data sets. Finally, my on-going work and future directions will be discussed.

5:00 – 6:00 Professor Le Song (Georgia Institute of Technology, Department of Computational Science and Engineering, College of Computing, USA), “Topic Modeling from Continuous-Time Document Streams with Dirichlet Hawkes Processes”

Topics and clusters in document streams, such as online news articles, can be induced by their textual contents, as well as by the temporal dynamics of their arriving patterns. Can we leverage both sources of information to obtain a better clustering of the documents, and distill information that is not possible to extract using texts only? I will talk about a novel random process, referred to as the Dirichlet Hawkes process, to take into account both text and temporal information in a unified framework. A distinctive feature of the proposed model is that the preferential attachment of items to clusters according to cluster sizes, present in Dirichlet processes, is now driven according to the intensities of cluster-wise self-exciting temporal point processes, the Hawkes processes. This new model establishes a previously unexplored connection between Bayesian Nonparametrics and temporal Point Processes, which makes the number of clusters grow to accommodate the increasing complexity of online streaming contents, while at the same time adapts to the ever changing dynamics of the respective continuous arrival time. We conducted large-scale experiments on both synthetic and real world news articles, and show that Dirichlet-Hawkes processes can recover both meaningful topics and temporal dynamics, which leads to better predictive performance in terms of content perplexity and arrival time of future documents.