Back to Projects

Opinion Dynamics & Information Spread Control

Planning and learning methods for controlling misinformation propagation in social networks, from the interactive InfoSpread system to generalized policies that transfer across unseen graph topologies.

Bharath Muppasani, Protik Nag, Vignesh Narayanan, Biplav Srivastava, Michael N. Huhns

AAAI 2024 Demo NeurIPS 2024 GenPlan at AAAI 2025 Best Demo Award

Overview Modeling Strategies System Publications

3 Project phases

86% Infection reduction on Cora

6 Reward structures studied

AAAI 2024 Best Demo award

Overview

InfoSpread studies how to control misinformation propagation in social graphs through sequential intervention planning. Each person is modeled as an agent with a continuous opinion value, trust-weighted interactions, and limited opportunities for corrective intervention.

The project develops across three connected phases: an interactive demo system for expressive simulation, generalized planning policies that transfer across graph families, and dialog-state control for belief management in human-AI settings.

AAAI 2024 Demo

InfoSpread System

Interactive simulation and planning-backed intervention loops for misinformation control. The demo received the AAAI 2024 Best Demo Award.

NeurIPS 2024

Generalized Planning Policies

Graph-based policies trained on structured plan traces and simulation rollouts, designed to transfer from small training graphs to larger unseen networks.

AAAI 2025

Dialog-State Belief Control

Extension of the same planning view to dialog management, where user beliefs across topics become the latent planning state.

Modeling Dynamic Opinion Networks

The social system is represented as a directed graph $G = (V, E)$ where each node has a continuous belief value $x_i(t) \in [-1, 1]$ and each edge carries a trust weight $\mu_{ik} \in [0,1]$. A node is considered infected when its belief crosses a misinformation threshold such as $x_i < -0.95$.

Opinion updates follow a linear adjustment process driven by trust-weighted interactions between connected agents.

$$x_i(t+1)=x_i(t)+\mu_{ik}\bigl(x_k(t)-x_i(t)\bigr)$$

At each step, the planner selects a budget-constrained subset of target nodes for corrective intervention. This turns opinion control into a long-horizon sequential decision problem rather than a one-shot classification task.

Environment and Intervention Model:

At each timestep, infected nodes propagate misinformation to their immediate susceptible neighbors. An intervention policy chooses a budget-constrained subset of nodes to receive authentic information.

Intervened nodes move toward positive belief, and once a node crosses the positive threshold it is treated as blocked from further misinformation spread. The episode ends when no candidate nodes remain.

Propagation and intervention process schematic — Propagation and intervention dynamics in the simulation environment.

Infection Rate Metric: The planning objective is evaluated using the fraction of infected nodes at each timestep.

$$\text{InfectionRate}(t)=\frac{|\{v \in V : x_v(t)<-0.95\}|}{|V|}$$

Intervention Planning Strategies

Combinatorial node selection quickly makes exact planning intractable. To preserve scale and transfer, we focus on generalized approaches that learn reusable strategies from graph structure.

Supervised Learning (SL) with GCNs

We use Graph Convolutional Networks (GCNs) to rank intervention candidates based on structural signals such as connectivity, local polarity, and influence flow. Ground-truth supervision is produced from search-derived solutions on small training graphs.

This framing addresses the generalized planning question directly: learn a transferable intervention policy that is not tied to specific node identifiers.

GCN inference architecture — Inference pipeline for the generalized planning strategy learned with graph neural models.

Reinforcement Learning (RL) & Reward Engineering

To remove expensive supervised labeling, we introduced a reinforcement learning pipeline with a Deep Value Network (DVN) that learns directly from simulation outcomes.

RL Advantage: Learning from feedback avoids NP-hard label generation loops while retaining strong generalization across graph families and scales.

A key NeurIPS 2024 contribution is the study of six reward structures balancing infection control, susceptibility, and intervention speed:

Reward	Definition	Role
R0 (Delta Infection)	$-\Delta I_t$	Rewards reductions in infection rate after an action.
R1 (Candidate Suppression)	$-\|C_t\|$	Encourages policies that shrink the set of immediate high-risk neighbors.
R2 (Hybrid Local+Global)	$-\|C_t\|-\Delta I_t$	Combines local containment with global infection reduction.
R3 (Episode Speed)	$1-T_{\text{used}}/T_{\max}$	Rewards finishing containment quickly within the step budget.
R4 (Current Infection)	$-I_t$	Penalizes states with high overall infection.
R5 (Mixed Episodic)	$-\|C_t\|-T_{\text{used}}/T_{\max}$	Balances candidate suppression with faster episode completion.

InfoSpread System Implementation

We developed InfoSpread to connect theoretical planning models to practical experimentation and operator-facing analysis.

Numeric PDDL Formulation: The system includes a numeric PDDL representation with explicit typed entities (agent, source, topic) and fluents such as have-opinion and have-trust, enabling direct execution on planners like Metric-FF.

The platform compiles network configurations into executable planning problems, launches planning/search backends, and visualizes intervention sequences for human-in-the-loop analysis and override.

InfoSpread Demo Video — InfoSpread system walkthrough (Google Drive).

Watch System Demo

InfoSpread UI — Interactive dashboard for inspection and intervention analysis.

Extension to Dialog Management

Recent work extends this formulation to conversational dialog management: a user state is represented as connected topic-beliefs, and utterances become interventions over that latent belief graph.

This links opinion dynamics with safer human-AI interaction design, where policies are optimized not just for one topic outcome but for coherent, multi-topic behavior change.

Publications

GenPlan at AAAI 2025
On Generalized Planning for Controlling Opinion Networks: Interpreting Human-AI Dialog States and Beliefs

Bharath Muppasani, Vignesh Narayanan, Biplav Srivastava

PDF Poster
NeurIPS 2024
Towards Effective Planning Strategies for Dynamic Opinion Networks

Bharath Muppasani, Protik Nag, Vignesh Narayanan, Biplav Srivastava, Michael N. Huhns

PDF OpenReview Poster
AAAI 2024 Demo
Expressive and Flexible Simulation of Information Spread Strategies in Social Networks Using Planning

Bharath Muppasani, Vignesh Narayanan, Biplav Srivastava, Michael N. Huhns

PDF Poster

Reward	Definition	Role
R0 (Delta Infection)	\(-\Delta I_t\)	Rewards reductions in infection rate after an action.
R1 (Candidate Suppression)	\(-\|C_t\|\)	Encourages policies that shrink the set of immediate high-risk neighbors.
R2 (Hybrid Local+Global)	\(-\|C_t\|-\Delta I_t\)	Combines local containment with global infection reduction.
R3 (Episode Speed)	\(1-T_{\text{used}}/T_{\max}\)	Rewards finishing containment quickly within the step budget.
R4 (Current Infection)	\(-I_t\)	Penalizes states with high overall infection.
R5 (Mixed Episodic)	\(-\|C_t\|-T_{\text{used}}/T_{\max}\)	Balances candidate suppression with faster episode completion.