Visual Analytics for Topic Model Optimization

Categories: Choice Information Research Technology

Essay, Pages 5 (1072 words)

Views

Introduction

Our group chose research paper "Visual Analytics for Topic Model Optimization based on User-Steerable Speculative Execution" on this study. Our study purpose is to analyse the visualization system in the chosen research paper.

Content Summary

Topic modelling is a method to find a set of words from a collection of documents that can best represents the information of the collection. It able to help us to summarize and organize a large set of textual information.

In this paper, the target users are those who work in the field of topic modelling.

Don't use plagiarized sources. Get your custom essay on

“ Visual Analytics for Topic Model Optimization ”

Get custom paper

NEW! smart matching with writer

The main goal is to improve analytic algorithm for streaming text data which able to have a more explainable and visualizable system for topic modelling.

The problem that the authors faced is trustworthiness of the result on topic modelling and lack of interactivity between users and the system. Thus, in the research paper, author introduced Incremental Hierarchical Topic Model (IHTM) to integrate users' domain knowledge throughout every steps of decision-making process.

Besides, a tailored visual analytic workspace is developed so that users are able to track the progress/step-by-step of how the algorithm decision-make process work.

Figure 1 show the visual analytic workspace of this research paper. There are five main sections for this visual analytic workspace.

Figure SEQ Figure

ARABIC 1 Visual Analytics Workspace Document-Log (right hand-side). Every new documents-insert will add into document-log before shown in the topic-tree.

Topic-Tree (center): Topic tree is the main component in this system. Model space and step-by step algorithm decision-make process are shown this in the Topic-Tree, as well as topic hierarchies, keywords and uncertainties.

Speculative Execution (bottom side): This section enables users to compare the result before they deciding on accepting one of them. Also, it shows the further development of different versions of the topic-tree over several processing steps.

Timeline (bottom side): This section enables users to track the metric development and see the impact on speculative execution.

Control Panel (left hand-side - optimization speculation & document groups): This allow users to choose view, accepted or rejected. Also, allows pausing the algorithm process, setting its speed and trigger the speculative execution manually. The optimization will be ranked based on their effectiveness. The color nodes on the left-hand side shows which document group the nodes belong to.

Besides, three different independent cases have been carried out in the research paper in order to evaluating the technique/algorithm proposed in term of understanding, diagnosing and refining topic model. Both quantitative and qualitative of the study have been evaluated and highlighted. The three independent cases will be further discussing in the next chapter.

Validation Method Analysis

In this section, bottom-up approach (technique-driven)is used to perform validation analysis. There are four levels of validation, as below:

Algorithm Level

The research paper found that most commonly used class of algorithm for topic modelling are generative and probabilistic. Probabilistic models often generate high quality result. But unfortunately, it is difficult for users to understand the result and to promote interactivity between users and model/system, e.g. LDA.

IHTM with two-fold speculative execution was introduced in the paper in order to have a more effective visualizable system to optimize the topic modelling process.

Idiom Level

Hierarchical topic-tree is built in this model workspace. It is gradually inserting each document from the collection to visualize the data. Root note is an aggregation of the entire corpus and documents are insert based on monotonic similarity function.

Next, there is a timeline which able to let users tracks the metric development. Every inserted document will show as a bar and all interaction events are noted on the bottom of timeline in order to track its history. Besides, every new document insert will add into document-log before showing in the topic tree.

Abstraction Level

Two corpora with different characteristic

a collection of 200 news articles from the COHA-corpus and
a transcribed presidential debate

Three data set for quantitative evaluation/study:

Presidential debate dataset

A collection of 125 newsgroup articles from 20 news datasets
100 new articles from the Associated Press
From operation-mode to speculation mode - users can compare and interact with multiple optimized models
Comparative tree visualization - comprehensively highlights diff between two models within the same topic-tree

Semi-supervised optimization

The detailed list of requirements was discussed.

Domain Level

In the paper, the target users are those who work in the field of topic modelling. The main goal is to improve/optimize analytic algorithm/model for streaming data which enable a more trustfulness, effective, explainable and visualisable analytic system for topic modelling.

There are three independent cases are carrying out in the paper to evaluate/validate the quality of the model proposed. First, a comparative study on the quality of the model with three annotators are conducted to validate the proposed IHTM algorithm/model. This evaluation is based on the feedback from annotators/expert on each of the topic model. The evaluation result shows that IHTM algorithm able to generate competitive result compare to other topic models (HDP, hPAM and LDA).

Second, User study (optimize model of familiar dataset): with 6 participants - verification with expert whether our tool to achieve model improvement

Model quality improvement evaluation: 4 annotators across variety of configurations

Strength and Weaknesses

Strengths:

The main strength of the system is it is integrating users' knowledge into the algorithm process. The proposed system is designed to be uncertainty-aware by using visual-fuzziness (refer to Vehlow et al). The system will automatically stop when there is problem detect during the algorithm decision-making process, then it will prompt the users and request feedback from users.

Next, this system allow users to adapt and guide the algorithm IHTM to their preferred topic granularity as backbone technique is implemented. Tailored interaction is designed in the system which able to further aid the understanding of the users on the on-going tree transformation. This design is focus on new created topic tree and promoting tracking of the changes which can improve the understandable of users. And this design allow system to provide users' feedback quickly to avoid waiting time or redo the whole modelling process.

Besides, there are timeline and speculative designed in the workspace. This enable users to track step-by-step how the algorithm decision-making process work. While users observe the building process, users are able to stop the algorithm process and directly interact with the topic-tree to provide their feedback. Also, this system allow perform more informed refinements as speculations of the impact of possible decision outcome are shown.