Our group chose research paper “Visual Analytics for Topic Model Optimization based on User-Steerable Speculative Execution” on this study. Our study purpose is to analyse the visualization system in the chosen research paper.
Topic modelling is a method to find a set of words from a collection of documents that can best represents the information of the collection. It able to help us to summarize and organize a large set of textual information.
In this paper, the target users are those who work in the field of topic modelling.
The main goal is to improve analytic algorithm for streaming text data which able to have a more explainable and visualizable system for topic modelling.
The problem that the authors faced is trustworthiness of the result on topic modelling and lack of interactivity between users and the system. Thus, in the research paper, author introduced Incremental Hierarchical Topic Model (IHTM) to integrate users’ domain knowledge throughout every steps of decision-making process.
Besides, a tailored visual analytic workspace is developed so that users are able to track the progress/step-by-step of how the algorithm decision-make process work.
Figure 1 show the visual analytic workspace of this research paper. There are five main sections for this visual analytic workspace.
Figure SEQ Figure
ARABIC 1 Visual Analytics Workspace Document-Log (right hand-side). Every new documents-insert will add into document-log before shown in the topic-tree.
Topic-Tree (center): Topic tree is the main component in this system. Model space and step-by step algorithm decision-make process are shown this in the Topic-Tree, as well as topic hierarchies, keywords and uncertainties.
Speculative Execution (bottom side): This section enables users to compare the result before they deciding on accepting one of them. Also, it shows the further development of different versions of the topic-tree over several processing steps.
Timeline (bottom side): This section enables users to track the metric development and see the impact on speculative execution.
Control Panel (left hand-side – optimization speculation & document groups): This allow users to choose view, accepted or rejected. Also, allows pausing the algorithm process, setting its speed and trigger the speculative execution manually. The optimization will be ranked based on their effectiveness. The color nodes on the left-hand side shows which document group the nodes belong to.
Besides, three different independent cases have been carried out in the research paper in order to evaluating the technique/algorithm proposed in term of understanding, diagnosing and refining topic model. Both quantitative and qualitative of the study have been evaluated and highlighted. The three independent cases will be further discussing in the next chapter.
In this section, bottom-up approach (technique-driven)is used to perform validation analysis. There are four levels of validation, as below:
The research paper found that most commonly used class of algorithm for topic modelling are generative and probabilistic. Probabilistic models often generate high quality result. But unfortunately, it is difficult for users to understand the result and to promote interactivity between users and model/system, e.g. LDA.
IHTM with two-fold speculative execution was introduced in the paper in order to have a more effective visualizable system to optimize the topic modelling process.
Hierarchical topic-tree is built in this model workspace. It is gradually inserting each document from the collection to visualize the data. Root note is an aggregation of the entire corpus and documents are insert based on monotonic similarity function.
Next, there is a timeline which able to let users tracks the metric development. Every inserted document will show as a bar and all interaction events are noted on the bottom of timeline in order to track its history. Besides, every new document insert will add into document-log before showing in the topic tree.
Two corpora with different characteristic
Three data set for quantitative evaluation/study:
The detailed list of requirements was discussed.
In the paper, the target users are those who work in the field of topic modelling. The main goal is to improve/optimize analytic algorithm/model for streaming data which enable a more trustfulness, effective, explainable and visualisable analytic system for topic modelling.
There are three independent cases are carrying out in the paper to evaluate/validate the quality of the model proposed. First, a comparative study on the quality of the model with three annotators are conducted to validate the proposed IHTM algorithm/model. This evaluation is based on the feedback from annotators/expert on each of the topic model. The evaluation result shows that IHTM algorithm able to generate competitive result compare to other topic models (HDP, hPAM and LDA).
Second, User study (optimize model of familiar dataset): with 6 participants – verification with expert whether our tool to achieve model improvement
Model quality improvement evaluation: 4 annotators across variety of configurations
The main strength of the system is it is integrating users’ knowledge into the algorithm process. The proposed system is designed to be uncertainty-aware by using visual-fuzziness (refer to Vehlow et al). The system will automatically stop when there is problem detect during the algorithm decision-making process, then it will prompt the users and request feedback from users.
Next, this system allow users to adapt and guide the algorithm IHTM to their preferred topic granularity as backbone technique is implemented. Tailored interaction is designed in the system which able to further aid the understanding of the users on the on-going tree transformation. This design is focus on new created topic tree and promoting tracking of the changes which can improve the understandable of users. And this design allow system to provide users’ feedback quickly to avoid waiting time or redo the whole modelling process.
Besides, there are timeline and speculative designed in the workspace. This enable users to track step-by-step how the algorithm decision-making process work. While users observe the building process, users are able to stop the algorithm process and directly interact with the topic-tree to provide their feedback. Also, this system allow perform more informed refinements as speculations of the impact of possible decision outcome are shown.
👋 Hi! I’m your smart assistant Amy!
Don’t know where to start? Type your requirements and I’ll connect you to an academic expert within 3 minutes.get help with your assignment