Data practitioners have widely adopted computational notebooks such as Jupyter Notebooks due to the relative ease of conducting as well as communicating data science workflows in narrative form. A key component in communicating such narratives is interactive visualizations rendered within widgets such as ipywidgets. These interactive widgets have been deployed to support workflows such as exploratory data analysis (e.g., B2) and model development for machine learning and computer vision tasks (e.g., Symphony.) However, several gaps remain in the design of existing widgets that negatively impact data practitioners’ experiences.
Gaps in Existing Widget Design
Widgets operate as embeddable, lightweight interfaces with interactive components in the notebook front-end. User actions on the front-end trigger pre-defined data operations that update the widget state and re-render components accordingly. The widget state maintains the values of the front-end component properties, e.g., frequency distribution corresponding to a bar chart. Figure 1 displays such an interface that summarizes US age distribution per state (source: https://data.transportation.gov/). The component on the left displays the list of states as a table, and the component on the right displays the corresponding age distribution. Initially, the widget displays the average age distribution across all US states. Clicking a state in the table triggers data operations that recompute the corresponding distribution of the bar chart.
Figure 1: Example of an interactive widget.
Despite the huge popularity and widespread adoption of these widgets, they are limited in several ways:
Lack of transparency and reusability. Existing widgets are stateless, as they do not track users’ interaction history and the corresponding state transitions. Consequently, the interaction history is lost. So users cannot access and reuse previous states. For example, in the widget shown in Figure 1, clicking a US state in the table recomputes the corresponding distributions of the bar chart, thereby triggering a widget state transition. Since these widgets only maintain the most recent state, recovering previous states requires users to execute the interactions from scratch. However, in exploratory analysis it is rare that a single user action produces the desired outcome. Users often execute multiple actions before obtaining insights and drawing conclusions. So keeping the history of actions and allowing users to retrace their steps is important.
Lack of customizability. These widgets also lack the affordances for end-users to customize the built-in data operations, defined by the widget developers, from the notebook. If these built-in operations do not meet users’ needs, they are forced to alter their workflows to align with the widget’s capabilities. The only alternative is to request widget developers to integrate a required data operation. For example, the widget displaying the notebook lacks a feature to select and view specific US states of interest. Say the user is interested in viewing only the states in the US South. Without any customization feature to select specific states, the user is forced to scroll and explore the Southern states manually.
Bridging the Gaps
We developed a framework called Magneton that redesigns existing widgets to address these gaps. As shown in Figure 2, through Magneton, we have introduced “stateful widgets” with advanced state and interaction history management capabilities to ensure transparency and reusability. Moreover, we augmented the communication mechanism between the widget front-end view and back-end Python kernel by implementing a wrapper so that users can override predefined widget data operations from the notebook. The technical details are explained in detail in our recent publication at CHI 2023 (accepted as a late-breaking work.) Let us see examples of these capabilities.
Figure 2. Design of (a) traditional and (b) Magneton widgets. The dashed (“- -”) elements — the stateful widget and widget view wrappers — are introduced by Magneton.
Example Scenario
Transparency and reusability. As users interact with the widget, each interaction is dynamically tracked. As shown in Figure 2, users can view their interaction history within the notebook and interactively revisit previous states. For example, in Figure 1, the user explored three states: Alaska, California, and Florida. Performing the same exploration using a Magneton-powered widget enables the user to utilize the history view and revisit the age distribution of California by clicking “Restore.”
Figure 3: Exploring the interaction history using the “history view.”
Users can programmatically export and reuse any widget state, thereby ensuring reusability. For example, as shown in Figure 4, the user issues the “view_state()” command from the notebook to access the age distribution of California. The user can issue the “get_state()” command to load the state data in a Python object and reuse the object for downstream tasks within the same notebook.
Figure 4: Programmatically accessing a widget’s current state.
Customizability. Users can write custom codes in the notebook to override data operations predefined by the widget developers. For example, in Figure 1, the widget data operation for the table by default returns all the US states. Redesigning the same widget using the Magneton framework enables users to override the data operation corresponding to the table view. Revisiting the previous case of viewing only the US states in the South — how can we define a new data operation that semantically filters these states? Let us employ the vogue of recent years — large language models — for this purpose. In particular, the user defines a function in the notebook called “group_by_region()“ that leverages the ChatGPT API to get the Southern states from the list of US states displayed in the table. As shown in Figure 5, the user overrides the built-in table computation function with the newly defined function, and then explores the Southern and Western states.
Figure 5: Overriding built-in operations with user-defined functions.
Conclusion and Future Work
We contribute Magneton, a framework for composing interaction history-aware and customizable widgets to enable transparent, reusable, and expressive data science workflows in computational notebooks. The Magneton widgets offer several benefits that existing widgets lack. The built-in interaction history enables users to share insights and data among various steps within a project. The shared action feature empowers users to customize the widget, thereby reducing dependency on developers.
The framework opens the door to interesting future research on building more expressive widget-authoring frameworks, exploring optimization strategies to deal with large-scale data when rendering widgets, and designing widgets with cross-platform capabilities that accommodate users of varying roles and degrees of expertise. Let us elaborate on these ideas:
- Widget Authoring. The current design of Magneton still requires widget developers to author widgets that end-users can explore. To this end, the Magneton framework parallels the D3-Vega stack for data visualization — Magneton operates as a kernel for authoring widgets. Therefore, a possible research direction can be to investigate more expressive widget authoring strategies — for example, via declarative specification of visualizations as well as interactions or direct manipulation-based systems — that utilize the kernel.
- Scalable Computation. For large-scale datasets, the latency of rendering Magneton widgets remains a bottleneck. VegaFusion, a recent work on enhancing the scalability of Vega visualization generation, performs automatic server-side scaling via partitioning strategies. Magneton can adopt similar strategies for tracking and maintaining interaction history and rendering visualizations. Other approaches that may be employed to redesign widgets for scale include applying classical database optimization techniques such as caching, pre-fetching, indexing, materialization, and incremental view maintenance on the server side.
- Interface Plasticity. Since notebooks can be collaborative, Magneton widgets may need to accommodate shared workflows, often involving users with different roles (e.g., data scientists and product managers). Therefore, the perceived value of an interface may vary across stakeholders within the collaborative setting. Another extension could be to equip Magneton widgets with cross-platform capabilities, such that these widgets can transform into a web application (and vice versa) to accommodate diverse stakeholder requirements.
Please check out the paper and our open-source repository for more details. If you are attending CHI 2023, please drop by the Late-Breaking Work (LBW) Poster Session A (Wednesday, April 26, 2023).
Written by: Sajjadur Rahman and Megagon Labs