Making the best decisions without having to handle all cards

Article “Apocalypse: A critical class of POMDP with Omega-enrolled objectives” The AAAI won the excellent paper award at the 2025 conference, a prestigious international conference on artificial intelligence. This year, only three papers received 3,000 accepted and 12,000 submits awards! These validation borders crown the results of research launched in Bordeaux (France) in the Synthes team in the Computer Science Research Laboratory (Labrie), where four of the authors work: Maryus Belly, Nathanal Fijlco, Hugo Gimbert, and Pierre Wendenov. The function also included researchers from Paris (Florian Horn) and Antwerp (Gillermo A. Perez). The article is freely available on the archive, and this post outlines its main ideas.

Labrie’s Synthes Team faces the challenging problem of program synthesis – developing algorithms that produce other algorithms based on the specification of a few examples or expectations. In practice, these powerful algorithms are used in different contexts. For example, most spreadsheet applications today provide automatal Fill-filled functions: you fill in a few cells, and based on these examples, a small algorithm is synthesized to complete the rest (dippacinth). Another example is the robotic control: the Operator Puretor assigns a task to the robot, such as regaining the control of the ball in a robo -up match, and the Robot’s algorithms determine the proper order of movements and actions to achieve the goal.

The origin of this task is a great example of our research activity, which is international, interdependent and collaborative. At a conference held in the Mastrist (Netherlands) in 2022, the award winner was the origin of the AAAI paper, where computer enemies. Maniko, mathematicians and economists gathered. At this event, Giloum Vijral and Bruno Giliotta presented their research on the mathematical study of the “Information Relative” event. The research then originated in Europe – from the sunny atmosphere of Bordeaux to the West to the ARS RSO, enthusiastic by the polish weather, passed through Paris, Mastric and Antwerp. The results were not achieved in one day, but gradually enriched by the views and skills of various contributors, by the process of tests, failures, dead ends and various perspectives. This is an excellent experience of research: when dealing with a new problem, starts with a small step. Understanding makes the ens more enlargeable, until those steps grow larger until the problem is solved or not. Research means learning to walk again with every new challenge!

When engineers and AI researchers need to solve synthesis problems, they usually use math formal pavement Marcov decision processesOr just MDP. The central question is: In the essential situation of the order of decisions described by MDP, how can one make good decisions? Or even better – how can automatically count the best potential sequence of any decisions, also known as one Holistic strategy?

What is exactly MDP?

In the context of this research, MDP is a limited-state system whose evolution is determined by both by decisions (choosing action) and opportunity. Here’s what such an animal looks like:

In everyday life, MDP can be useful in many situations – for example, while playing SolitaryAlso identified Patience Neither Spider In its popular type. The following scenario shows the moment to make a decision in the MDP: Should the black king be left to the left on an empty stack? If so, what? The decision is challenging because many cards remain down and will be announced later1.

1 Even when all cards are announced, the problem is not negligible.

Two main types of AI algorithms to solve MDP

There are two main categories of AI algorithms to solve the MDP, which seems to be the same before but the computer is completely different .No is completely different:

  1. Arrogant -based algorithms – This works well in practice but lacks the theoretical explanation for why they succeed. Most machine learning methods, especially using neural networks (DERPR), come into this category.
  2. Specific algorithms -This is always guaranteed to provide the correct answer, but is often slower than heristic based algorithms. They are related to the field of trusted AI, based on the fantasies of the calculation and decisiveness worn by Alan Turing.

Our article comes in the second category: a reliably calculates a specific settlement in the form of the best strategy of the proposed algorithm, which can be used with full confidence.

Certain calculation and limit of AI education

Let’s become real-DEPRL-based learning techniques can calculate the strategy for very complex examples, while certain techniques based on computability theory are currently limited to simple examples. For example, Google Deepmind used dipral techniques to synthesize pending strategies GalaxyA popular video game, which requires dozens of decisions per second based on millions of dimensions. Deepmind’s AI initially defeated the best players in the world, but his strategy was not the best-the back of the back was discovered.

Currently specific methods are impractical to solve a complex problem GalaxyBut it does not prevent them from becoming effective in practice. For example, Bordeaux’s second success story: The Robon team at Labrie won the Gold Medal in the Robockup 2023, using specific methods to solve small MDPs depending on information distribution in multiple co -operative robots.

The complexity of decision problems and the role of information

The difficulty to solve the problems of the decision varies greatly based on the information available when making the decision. Ideal is the case Full informationWhere all data is available. A classic example is a navigating robot on one way where the exact position and approach of the layout and robot is known. In this case, the calculation of the solution is relatively easy: the algorithm can find a way to exit (eg, using the algorithm of the Dijkstra) and follow it step by step.

However, in real world problems, it is rare to have all the information. For example, in SolitarySome cards are hidden, which the player needs to make assumptions. Generally, such problems cannot be resolved exactly: according to Turing’s Competency Theory, there is nothing in the algorithm that a computer cannot operate, can always solve all MDP control problems with precision. This may seem frustrating, but it does not prevent the identification of special cases. The problem becomes easier. In the theoretical Computer Vijnan, this is called Autumn.

Award -winning research contribution

AAAI-PURSARS SHORTA A MDP: Problems of decision with “strong revelation”, where there is always a nonzero probability that the exact position of the world will be declared at every step. The paper also provides critical results for “weak reveals”, where a particular state is finally guaranteed to be declared but not required on every step – equal SolitaryWhere the hidden cards are slowly exposed.

A research paper should always pay attention to the future and open new directions. Can analyze MDP with the proposed algorithm disclosure (stronger people). Is an interesting perspective Opposite the problem: When an algorithm is used in any game, what happens? These players can enable all games analysis, restricting the amount of strategy they can use or the amount of information they can process.

Tags Gs: AAAI. AAAI2025


Nathanal Fijlco is the Director of Research (Full Professor) at CNRS


Hugo Gimbert is a working CNRS researcher


The Florian Horn is a research collaborative with CNR


Gillermo is a collaborative Professor of Perez Antwerp University


Pier

Scroll to Top