As stated the idea of they're supposed to "learn" is totally overblown. Traditional models already have some AI built into them and already do this to an extent. From what I understand (and this may not apply equally to AI models) is
AI assists with the initializing scheme whereas it combs through ingested data and will "remove" what it believes to be bad data or an outlier based on a slew of historical information. The goal here is, or the idea is, this will lead to a more accurate initialization which is important because once you move forward in time you start to introduce error and that error becomes compounded over time...that is why forecast models (OP) are generally useless beyond D7-10 and can even be relatively useless past D5 if there is alot going on. Error also occurs because of rounding and approximations, especially approximations.
AI models are built on a wealth of historical data where it runs and looks for similarities, both to the initialized field and then forecasts based on how these similarities evolved in the past.
The challenges in all of this is, there is still a lot we don't understand about weather, particularly when it comes down to processes which occur during storm evolution and it becomes even more of a challenge because for forecast models to ingest this data we have to be able to parameterize it.
There is much more to this then just verifying a specific level or variable and even that leads to a lot of questions. Probably in a tame weather pattern that is not hostile, AI will outperform but what good is that or what value is that really adding?