On December 21, 2022, as the Peak Holiday Season Traveling was underway, passed through a cascading series of failures in the south -west airlines Denver area by severe winter weather in the Denver area. But the problems are spread across their network, and over the next 10 days the crisis has led to 2 million passengers and lost 50 750 million for the airline.
How did the local weather system stimulate such a widespread failure? Researchers at MIT have investigated this widely registered failure as an example of cases where most of the time worked easily breaks suddenly and causes the influence of the Domino of failure. They have now developed a calculated system to combine scattered data about the occurrence of rare failure, with more extended data on normal operations, to work on the back and direct the root cause of failure, and hopefully it will be able to find ways to adjust systems to prevent such failures.
These findings were presented to the International Conference Huln No Learning Representation (ICLR), which was organized in Singapore from April 24-28 by MIT Doctoral student Charles Dawson, aeronautics and astronaut Chuchu fan and a colleague of Harvard University and Michigan University.
“The inspiration behind this task is that it is really frustrating when we have to communicate with these complex systems, where we are creating these issues or failures, it’s really hard to understand what is going on behind the scenes,” says Dawson.
The new task is to build on previous research of the fan lab, where they look at the problems associated with the problems associated with the problems of the imaginary failure, such as working on work together with groups of robots, or with complex systems such as power grids, looking for ways to predict how such systems may fail. “The goal of this project was to turn it into a diagnostic tool that we could use on real-world systems,” says the fan.
“When there was an issue or failure in this real -world system, someone had the idea of providing us with data,” says Dawson, “and we can try to diagnose the original causes, and provide some appearance behind the curtains on this complexity. “
They say that “they are intended for developed methods for the beautiful general class of cyber-physical problems. These are the problems that” you have an automated decision-making component that interacts with the real world dislocation, “explains the tools available for the test of the Software Futware Systems, but when they work on their own, they work on their own. When it falls, there is a plane and the inputs and outputs on the electric grid, which often says, “S software can make a decision, but then it can make a decision.” Makes. “
One of the main differences is, in systems like robots, unlike aircraft schedules, “we have a model in the robotics world. Success,” is the main investigator in the MIT’s laboratory for MIT’s information and disease systems (LIDs). “We have a little better understanding of the physics behind the robotics, and we have ways to create a model” which represents their activities with reasonable precision. But airline scheduling contains processes and systems that are professional information of ownership, and therefore researchers had to find ways to guess what was behind the decisions, using only relatively scattered public information available, which essentially includes the actual arrival and departure time of each aircraft.
“We have caught all these flight data, but this is the whole system of the scheduled system, and we do not know how the system is operating,” says Fan. And the amount of data related to actual failure is only worth several days compared to the data of normal flight operations.
The impact of weather events in Denver in Denver was clearly seen in the flight data during the southwest timetable crisis, the long -lasting normal turnaround between landing and takeoff at Denver Airport. But the way the system affected the system was less clear, and further analysis was needed. The key is to do with the concept of the reserve aircraft.
Airlines usually reserve some planes at various airports, so if problems are found in one of the aircraft scheduled for flight, the second plane can be replaced quickly. Southwest uses only one type of aircraft, so it is all interchangeable, making such a substitute easier. But most airlines work on the hub-and-spock system, with some designated hub airports where most reserved aircraft can be kept, while the southwest hub does not use, so their reserved planes are more scattered in their network. And the way those planes were deployed will play a key role in the emergency appearing.
“The challenge is that no public data is available in the context of where the aircraft is in the southwestern network,” Dawson says. “What we can find using our method is, looking at the arrival, departure and public data of delay, we can use our method to explain the hidden dimensions of that aircraft reserve.”
What he found was that the way the reserve was set up was a “leading indicator” of the problems that were cased in a nationwide crisis. Some parts of the network that were directly affected by the weather were able to quickly recover and return to the schedule. “But when we looked at other areas of the network, we saw that these reserves are not just available, and things were getting worse.”
For example, data showed that Denver’s reserves were declining rapidly due to weather delays, but then “he also allows us to find this failure from Denver to Las Vegas.” He says. While there was no intense weather, “our method still shows the continuous decline in the number of aircraft capable of giving us flights to Las Vegas.”
“What we found was that there were this rotation of the aircraft in the Southwest network, where the aircraft could start the day in California and then fly to Denver, and then end the day in Las Vegas.” What happened in the case of the storm was that the cycle was disturbed. As a result, “this breaks a storm cycle in Denver, and suddenly reserves in Las Vegas, which is not affected by the weather, starts to deteriorate.”
In the end, Southwest was forced to take drastic steps to solve the problem: they had to “hard reset” their entire system, canceled all flights and fly empty aircraft across the country to balance their reserves.
Working with experts in the air transport system, researchers developed a model of how the scheduled system is operating. After that, “what our method does, we are essentially trying to run the model backwards.” Given the observed results, the model allows them to work back to see what kind of initial conditions it can produce.
When the data of real failures was scattered, the extended data of the typical operation helped the computational model “What is possible, what is possible, what is the physical probability here,”. For failure “in the extreme event, in this extreme event, in this extreme event, after that, it gives the Domain J Knowledge, to say in the extreme event, for failure.
This can lead to a real-time monitoring system, saying, where data on normal operations is constantly done compared to current data, and determines what the attitude looks like. “Are we trending towards normal, or we are attending extreme events?” Given the signs of imminent issues, prior steps, such as rearranging reserved aircraft in advance, may be allowed in areas of expected problems.
Fan says the work of developing such systems continues in its lab. In the meantime, they have built an open source tool to analyze failure systems, called KLNF, which is available for anyone to use. Meanwhile, D AW Sun, who won the doctor’s degree last year, is working as a postdock to apply methods developed in this task to understand the failures of the power network.
The research team also includes Van Tr Ran of Max Lee and Harvard University from Michigan University. The work was supported by the NASA, Air Force Office of Scientific Research and the MIT-DSTA program.