Rethinking Ranking: Introducing Multi-Conditional Ranking (MCR) with LLMs

Ranking algorithms are integral to numerous applications in our daily digital interactions. Search engines like Google utilize ranking models to present the most relevant web pages in response to user queries. E-commerce platforms employ ranking systems to sort products based on relevance, popularity, and user preferences. Social media platforms use ranking algorithms to order content in users’ feeds, prioritizing posts that are deemed more engaging or relevant to the user. 

Traditional ranking systems typically rely on one of three main approaches: Pointwise, Pairwise, and Listwise. In a Pointwise approach, each item is scored independently, and the final order is determined by these scores. Pairwise ranking, on the other hand, compares items two at a time to decide their relative importance, minimizing errors in their order. Meanwhile, the Listwise approach considers the entire set of items simultaneously, optimizing the final arrangement as a whole.

While these strategies have proven effective in many scenarios, they usually focus on ordering a large set of items based on a single condition or a set of predefined criteria with a fixed sort order—like sorting products by customer rating or documents by relevance to a search term. But the real world is rarely so straightforward. Often, we need to rank items based on multiple, potentially conflicting conditions where the overall sorting logic isn’t predefined. For instance, an e-commerce website might need to consider factors like customer reviews, delivery time, and seller credibility, each with varying degrees of importance that depend on user preferences. Similarly, a hiring platform might need to balance conditions such as years of experience, specific skills, and cultural fit, where priorities can differ and sometimes even contradict one another. This lack of clarity in prioritization and the subjective nature of conditions make the task significantly more complex, requiring optimization beyond simple linear methods.

This is where Multi-Conditional Ranking (MCR) enters the picture. Instead of focusing on a single overarching condition, MCR focuses on scenarios where ranking must be determined based on multiple, prioritized criteria, which may sometimes conflict. In our recent work, we introduce MCRank, a specialized benchmark designed to test how well language models handle these multi-conditional scenarios. Observing the low performance of existing LLMs on MCRank, we also propose EXSIR, a novel reasoning framework that breaks down the problem into manageable steps, allowing LLMs to better address the intricacies of MCR tasks.

What Is Multi-Conditional Ranking (MCR)?

Imagine a teacher who needs to select some questions from a pre-existing list. Without needing to read or solve each question, the teacher might want to rank them based on conditions such as:

  1. The topic of the question (e.g., math has higher priority than science).
  2. The difficulty level (only considering easy questions).
  3. Specific features (e.g., being multiple-choice based).

Traditional ranking methods struggle to handle this complexity. They’re great at single-query tasks but fall short when multiple, sometimes conflicting, rules are at play. MCR captures these scenarios, where a small set of items must be ordered based on multiple, weighted, and often interacting conditions.

Our benchmark, MCRank, sets out to test LLMs’ capability in handling this task. It includes scenarios with varying levels of complexity: varying number of conditions (1–3), varying number of items (3, 5, or 7), and varying size of items consisting of a few tokens or paragraphs. We designed the benchmark to cover five condition types:

  • Positional: Placement of items in specific positions.
  • Locational: Sorting items based on geographical attributes.
  • Temporal: Sorting items by dates or time-related features.
  • Trait-Based: Placement of items based on specific characteristics (e.g., size, color).
  • Reason-Based: Sorting items requiring logical or mathematical reasoning.
Type Condition Examples
Positional
Item “[one of the items]” should be the last from the left
Location

Items that are in Africa should appear at the beginning

Temporal

Sort items based on their deadline from the first to the last

Trait-Based

Sort the items based on their size from the smallest to the largest

Reason-Based

Items that have the largest yards of touchdown should appear at the beginning

Real-World Applications of MCR

Multi-Conditional Ranking isn’t just an academic exercise—it has real-world impact:

  • Recommendation Systems: Sort products not just by reviews, but also by delivery time and customer-specific preferences. For example, imagine a user on a movie recommendation platform. To avoid reading every plot summary, they might want movies ranked by specific conditions: (1) shorter films first (low priority), (2) higher IMDb scores prioritized (medium priority), and (3) the top-rated movie placed last (high priority) since they’ve already seen it.
  • Human Resources: Filter job candidates based on experience, education, and job-specific requirements. Assume a recruiter needs to rank shortlisted resumes to quickly identify the top candidate for detailed evaluation, saving time and effort. They might prioritize candidates based on: (1) having publications in top-tier conferences is considered a plus (low priority), (2) ranking those with more NLP experience higher (medium priority), and (3) moving the most overqualified candidate to the end (high priority) to focus on others first.
  • Education Platforms: Rank study materials based on difficulty, topic relevance, and usage priority.

In each one of these applications, items often need to be ranked based on diverse and sometimes conflicting conditions, a challenge not adequately addressed by current ranking methodologies and benchmarks.

Why Are LLMs Struggling with MCR?

In our experiments, we found that even state-of-the-art LLMs like OpenAI o1-mini struggle as the number of conditions and items increases. Their accuracy drops dramatically when asked to handle three conditions across larger sets of items.

The key issue? LLMs tend to fail when conditions need to be prioritized and applied sequentially. Asking a model to sort items directly based on a jumble of unordered, complex conditions often leads to chaos.

EXSIR: A Better Way to Rank with LLMs

To address these challenges, we proposed a novel reasoning method called EXSIR (EXtract, Sort, and Iteratively Rank). Here’s how it works:

  1. Extract: The model identifies and extracts each condition from the given instructions.
  2. Sort: It prioritizes the conditions based on their importance.
  3. Iteratively Rank: It applies the sorted conditions one by one to refine the ranking step-by-step.

This approach breaks the ranking problem into smaller, more manageable steps, allowing LLMs to handle complex scenarios more effectively.

Our experiments showed that EXSIR boosts ranking accuracy by up to 14.4 percentage points, outperforming alternative reasoning methods like Chain-of-Thought (CoT) prompting.

How Does EXSIR Compare to Other Approaches?

We compared EXSIR to existing strategies like Zero-Shot CoT prompting and traditional ranking models like RankGPT and SFR. The results were clear:

  • EXSIR consistently outperformed existing methods.
  • Existing ranking systems built for single-query tasks failed to handle MCR’s complexity.

This suggests that structured, step-by-step reasoning is essential for tackling MCR tasks effectively.

conditional ranking chart

Looking Ahead: Next Steps for MCR Research

While EXSIR is a big step forward, there’s still work to do:

  • Optimizing EXSIR for real-world efficiency and lower computational cost.
  • Exploring multi-agent systems where different LLMs handle decomposition and ranking separately.
  • Introducing interactive systems where users can refine rankings in real time.

We’re excited to see how the broader research community builds upon our findings.

Conclusion

Ranking tasks are at the heart of many real-world systems, and Multi-Conditional Ranking (MCR) represents an important frontier for LLMs. With the MCRank benchmark and our EXSIR method, we’ve shown that LLMs can significantly improve their performance on these challenging tasks when guided by structured reasoning.

Whether you’re building smarter recommendation systems, enhancing educational platforms, or refining hiring pipelines, MCR and EXSIR offer a glimpse into the future of intelligent ranking. Read the research paper here

Written by: Pouya Pezeshkpour and Megagon Labs

Follow us on LinkedIn and X for more! 

Share:

More Blog Posts: