Table of Contents

Logo Value - Spectrum

Quantifying Preferences of Vision-Language Models via
Value Decomposition in Social Media Contexts

Jingxuan Li*1, Yuning Yang*1 , Shengqi Yang2, Linfan Zhang1, Ying Nian Wu1
1University of California, Los Angeles; 2Los Alamos National Laboratory
*Equal contribution, alphabetical by first name

Introduction

Value-Spectrum Framework Overview

We introduce Value-Spectrum, a benchmark designed to systematically evaluate preference traits in Vision-Language Models (VLMs) through visual content from social media, based on Schwartz’s core human values:

  • 🤝 Benevolence — caring for and helping others
  • 🌍 Universalism — understanding, appreciation, and protection of all people and nature
  • 🧭 Self-Direction — independent thought and action
  • 🏆 Achievement — personal success through demonstrating competence
  • 🎢 Stimulation — excitement, novelty, and challenge in life
  • 🍰 Hedonism — pleasure and sensuous gratification
  • 🛡️ Security — safety, harmony, and stability of society and relationships
  • 📏 Conformity — restraint of actions that might upset others or violate social norms
  • 🧧 Tradition — respect, commitment, and acceptance of cultural or religious customs
  • 👑 Power — social status, prestige, and control over people and resources
Value-Spectrum Data Collection Pipeline

Value-Spectrum utilizes VLM agents embedded within social media platforms (e.g., TikTok, YouTube, etc.) to collect a dataset of 50,191 unique short video screenshots. These screenshots span a wide range of topics, including lifestyle, technology, health, and more.

YouTube Video

Powered by NotebookLLM for generating this podcast!

Value-Spectrum Dataset

Dataset Overview Image

We introduce a novel dataset comprising 50,191 single-frame images, each extracted from a short video sourced from Instagram, YouTube, and TikTok. These source videos span multiple months and cover diverse topics, including family, health, hobbies, society, technology, etc. A key feature of this dataset is its annotation: each image is mapped to one or more of the 10 core human values from Schwartz's theory via keyword matching. In addition to the annotated image, each entry includes the original video link, relevant platform metadata (such as platform name and post date), and the VLMs‘ preference answers (e.g., like or dislike) for the image. This rich collection, captured between July and October 2024 through an VLM-driven GUI agent, is structured in a vector database to facilitate research into VLM behavior, content preferences, and value decomposition across diverse social media landscapes. Our dataset aims to empower deeper insights into model interpretation and the dynamics of online content. The full dataset, with its value annotations, is currently being prepared for public release on Hugging Face and will be available soon.

Experimental Results

To provide a comprehensive understanding of Vision-Language Model (VLM) value preferences within social media contexts, our analysis systematically examines their responses through the framework of Schwartz's 10 basic human values. Our investigation covers several key dimensions: we present comparative VLM preference scores (considering model characteristics like size and open-source status), discern average preference trends for each of the 10 values, detail individual VLM profiles alongside their preference consistency, and further explore VLM capabilities in value-driven role-playing as well as their preference alignment across diverse social media platforms.

Ablation Study: VLMs vs. Corresponding LLMs

To further understand the factors influencing value preference outcomes on our Value-Spectrum dataset, this ablation study compares Vision-Language Models (VLMs) against their corresponding text-only Large Language Models (LLMs). We also examine the impact of different input modalities (e.g., multi-modal vs. text-only) and combined input settings, using a validation set of 100 samples for each value.

BibTeX

@inproceedings{Li2024ValueSpectrumQP,
  title={Value-Spectrum: Quantifying Preferences of Vision-Language Models via Value Decomposition in Social Media Contexts},
  author={Jingxuan Li and Yuning Yang and Shengqi Yang and Linfan Zhang and Ying Nian Wu},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
  year={2025},
}