Friday, 1 February 2013

Simulations Part 2: Yuya, Jund v2, Bannings and PT Nagoya

Summary from Last time

I think there was a misunderstanding about what I was trying to do with my last post.

To be clear: If you could guess the metagame and win percentage matrix perfectly you would know the best deck.

But this is obviously both unlikely and extremely costly time wise. Instead I want to use the math, programming and examples to challenge commonly held beliefs of the “pro” community which may or may not be true. All of us rationalize deck choice and it is useful for us to try and at least take an analytical lens to these arguments. So I wanted to summarize the practical advice that applied from my last article:
  • Don’t play the most popular deck if its bad (even if you are good with it). The playskill edge you need to make this worth it is large. See later section for details.
  • If you want to top 8 (or “do well”) focus on beating the most popular deck.
    • But a complete glass cannon doesn’t work either. See example: affinity/scapeshift/tron.
  • If you want to win, focus on (e.g. a PTQ) don’t worry about beating the most popular deck, worry about beating the decks that beat the most popular deck.
    • Top 8ing and Winning can require distinct deck selection considerations. See RPS example from previous post.
  • Just looking at what percent of the metagame (even a winner’s metagame ala Karsten/Chapin) is not a good indicator of what the best deck is and even more unintuitively it may not be a good guide as to what you should be focusing on beating.

This Week: Motherfucking Science!!!?#$

  1. Addressing reader comments on the previous article.
  2. How much is being Yuya worth (other than 57 pro points)?
  3. What happened to Jund.
  4. Theory as applied to PT Nagoya
  5. Theory vs Simulations. Math! Proofs! Almost Rigorous!
  6. 5 minute break to relieve your boner from the last section.
  7. Conclusions

1. Comments from last time

I would like to take a moment and genuinely thank everyone who made comments on the previous post. Most of it was on my facebook wall and it was cool to see how many people enjoyed a slightly different approach to Magical analysis. I appreciate every single comment and will try to address some of the points here.

Thanks to Paul Jordan who did some more analysis and hooked me up with some excel so that I could format things more easily.

Do More Trials Short answer I think this is a non issue. I am not sure why 1000 trials is not enough. The distributions I am using aren’t exotic enough for me to think they warrant it and my code is unbearably slow (1000 trials is already an overnight process). That being said Jarvis has been the sent the code and may be able to optimize it.

What happens now that Jund lost BBE? To be answered in a third and final post hopefully next week. I also want to address the # of rounds importance. But I imagine it will require a subsequent post, because people only want to read so much boring math. Eli Priest already had the gist if you read his facebook post.

What about writing for a major website? Unfortunately not possible right now. I really appreciate everyone who shared and retweeted the link for this blog since I don’t have reach any other way. Special thanks to Sperling (who tweeted) and whoever posted it on Reddit (4000 views from Reddit and 400 from MTGSalvation forums).

What about Model/Metagame uncertainty? Is this useful? Again this isn’t a tool for predicting the tournament exactly ex-ante. Rather a tool that helps us analyze “How We Should Think”.

Top 8 is 3 of 5 did you account for this? No. In a related point one reader thought I would systematically underestimate a top 8 deck winning (conditional on having top 8ed). Empirically that might be true. And the 3 of 5 might have to do with that. A deck with a great sb will get an edge in the top 8, rendering the win percentage matrix not constant throughout the tournament. Its a rather trivial addition to my program to correct for this but I am not sure how I would get the correct assumptions. Moreover many tournaments rely on 2 of 3 in the top 8 (GPs, PTQs, FNM etc..)

Pro Tours have limited Nothing much I can do about that other than assume it has zero impact or flip coins for every limited match. I have asked Paul Jordan to look into how Jund players did during limited rounds during PT RTR (to get a sense of how much it matters), but its a ton of work I imagine.

What about variations in player skill and deck construction See the section on what being Yuya is worth. Unfortunately I can only use Paul Jordan’s categorizations of decks. And he needs to aggregate disparate lists to get meaningful sample size on deck win percentages.

Is there anything else more practical we can do? Some ideas I have had:
  • How much can tie breakers actually move at the end of a X round tournament (Sorry Conley!).
  • Is the MODO metagame rational? Are there “sticky” deck choices (for all you economists out there). Whats the time-lag for information processing? Obviously you could check the IRL metagame as well.
  • Is Yuya a robot?
  • Prices. A long time ago I sent an involved article to Channelfireball about card prices, I am not sure exactly what happened to it. If there is demand for this kind of thing, I would consider trying to find it or redoing it. Essentially I wanted to mythbuster magic finance.

2. Why I would rather be Ari Lax than Yuya Watanabe.

From this point on I will be using theoretical probability tools as well as simulations . For a detailed discussion of why this might matter check out section 5. Otherwise take my word for it that the theory is sound.

We can measure how good a deck is in a given round by calculating its Expected Winning Percentage. Imagine Yuya has a 10% higher win percentage in every matchup (including the mirror). Thats a pretty substantial edge, especially at the Pro Tour level.

Chart 1.

 How do we make sense of this? In round 1 Yuya has the best win percentage of everyone. Yet in round 10, the value of being him with Jund is worse than being an average player with Poison, Eggs or Tron.

If we stop and think this is just a simple corollary of the previous post. By the time round 10 gets around all of Junds good matchups have been drastically squeezed and its bad matchups have proliferated. This happens because of the popularity of Jund. So Jund’s win percentage at the top tables is in constant decline. Yuya still has his 10% edge, but it isn’t enough to overcome his deck selection disadvantage (theoretically anyways, since obviously he top 8s and thus implodes math).

3. But can we explain what happened after PT RTR - Jund Edition

I think its fairly clear to most of us that the Pre-PT Jund builds were often inferior to what would become the best version of the deck. Deathrite, Liliana and Lingering souls weren’t even mainstays at that point. As a proxy for how the season developed I reran the simulation for PT RTR, but gave all jund players a 5% bump in every non mirror match. How did that change things:

Chart 2.

Note if the improvements over the course of the last 4 months were even larger its reasonable to see how Jund might have won 75% of the GPs. But a large part of its dominance would still be due to its initial meta size. The “improved” Jund from this case only wins ~52% of its matches. If the newest versions “solve” the affinity matchup it wouldn’t up their win percentage by that much but would of changed their win tournament percentage to the ~35% range.

If an extremely popular deck has a positive expected win percentage (even if that edge is small), it will post DOMINANT results

I wonder if there is some kind of psychological feedback mechanism in play at this point. The deck wins so people play it. But people playing it means it wins. Thus a deck seems dominant when in reality it would be perfect rational to play a host of other reasonable choices. #ThinkingCapsOn

I don’t want this blog post to get sidetracked, but I think in the wake of the B&R announcement, its easy to see how wizards might have made a rational overreaction. Banning might have been needed to break up the cultural inertia that had built up behind Jund. The metagame was stale not due to Jund’s dominance but because of its inertia. Bans are a way to encourage diversity by changing peoples perceptions (they think Jund is now as bad as it actually already was), but not the reality.

Let me know if this makes sense. Summarizing:
  • Jund is actually not a great deck (~53% with some bad matchups)
  • But people think its great (>60%) so a lot of them play it
  • The combination leads to a lot of success kind of like 10,000 monkeys on 10,000 typewriters. This reinforces the erroneous beliefs.
  • Wizards bans BBE which has zero impact on the actual viability of Jund but makes people adjust their beliefs regarding its power.
  • Now that its perceived power is equal to its actual power, people again begin trying alternatives.
  • Thus the metagame becomes more diverse.
  • If people were completely rational they would of tried new things even without a ban. But we needed a shock to a system because of incorrect perceptions/metagame inertia/some other reason.
Realistically Jund was probably overperforming too much for the above to be true, but I think its in the realm of possibility.

4. PT Nagoya

Per PV’s suggestion I thought Nagoya would be an interesting second case because the popular deck was actually very good. As of this very moment the Simulation for the PT is running but I would like to present my estimates based on theory for similar metrics to the last post. If the simulation ends up being drastically different than my predictions it will be reported.

In this case I am much less confident about how I filled in the win percentage matrix since I never played in the block format. If someone good wants to double check that for me shoot me a PM or comment. I also don’t have Infect or Tezzeret variations separated out.

Chart 3.

In this case I think intuition lines up much better with the results. The three best decks in terms of overall win percentages also top 8 the most. The two non-Tempered Steel decks with the best Tempered Steel matchup are the best decks for both top 8ing and winning the tournament. We can take away:
  • If the popular deck is good. Its a fine play. Especially if you want to top 8 (as opposed to needing to win).
    • If you personally had the a good mirror match than the deck becomes a very good choice. Unlike the previous Yuya example, there is no adverse selection in the metagame your bad matchups don’t get more popular.
  • Beating the most popular deck is much more important if the deck is good. This seems to be independent of your goal (Top 8 v Win) in this case.
  • If the popular deck is good, the number of viable decks is probably much smaller than when the most popular deck is bad (duh?).

5. Theory vs Simulations. Math! Proofs! Almost Rigorous!

Estimating the results by theory has a couple of huge advantages. The disadvantage is that I have to make even more assumptions. The advantage is mostly to due with speed and being able to adjust parameters instantly for instant results.

A comparison of results for the original PT RTR example. Simulation vs my Theoretical results. Note for the top 8% theoretical I am using the theoretical metagame of X–1s or better. This obviously isn’t exactly equal to the top 8.

Chart 4.

The results are very close for the top 8. And kind of close for the Win %. Not sure if thats because the simulation has noise, or the latter theoretical numbers are overburdened by the assumptions. Either way I am pretty comfortable pending the results of the Nagoya simulation.

6. Take a minute to please tweet this post. You can include me @toordeforce or not. Also feel free to share on fbook.

Don’t worry you can alt-tab. I’ll still be here.

7. Conclusion 

Obviously we are just barely scratching the surface of whats possible here. I hope to do one last follow up post on simulating metagames and then move on to other things (possibly one of the questions mentioned previously).