
Statistical vindication of Beyond Silver
When we asked you, our readers, what you wanted to see, at least one of you wanted more numbers.
As a numbers kind of guy myself, I’ve been hard at work scraping and parsing the Isotropic Dominion logs. Without further ado, I present CouncilRoom.com: Dominion Statistics.
The site has four main features:
First, we have graphs of the turn a card is purchased versus its winrate. For instance, as demonstrated in the picture above, buying non-terminal +1 Action/+1 Cards is indeed superior to buying Silver, starting as early as Turn 5. Or you can see that buying Province starts to be better than buying Gold as early as Turn 7:
Second, graphs of card advantage against winrate. For instance, you can see that having a Minion advantage over your opponent(s) substantially increases the likelihood that you’ll win, much more so than having a Market advantage (or even a City advantage):
As you may have noticed, the query field supports some logical operation syntax. Currently, the supported operations are:
- && and || (AND and OR operators, with parentheses support)
- Comparison operators (>, <=, and ==)
- Cost (e.g., “Cost==7” returns Bank, Expand, Forge, and King’s Court)
- Actions (e.g., “Actions > 1” returns all +2 Actions cards)
- Cards (e.g., “Cards <=1" returns all cards that draw at most +1 Card when played)
- Action/Treasure/Victory (e.g., “Treasure && Victory” returns Harem)
Third, player pages, which contain the player’s game history sorted by opponent. It even accounts for the mess that the BGGDL inflicted on players’ usernames.
Finally, a game search page, for finding games based on players and Kingdom cards.
I realize that this is just the tip of the iceberg of what is possible with this data. So far, I’ve only generated the two graphs above, but I’d love to hear your suggestions and have you rank ideas on the CouncilRoom.com UserVoice page.
For programmers among us, the source code is open and available. Even if you aren’t interested in the code, I could provide a parsed representation of the log data for your own purposes.
There’s much to be learned from all this data. I’m very excited about introducing even more of these kinds of insights on the Dominion Strategy blog: I’ll be supplementing theory’s articles with interesting statistical analyses. Stay tuned!
Statistics is my area of interest.
I have the logs but haven’t had the time to do the parsing. There are some good statistical tools like R that could be used on this data. Who would be good to contact about participating in that research?
I am a software engineer with some interest in statistics, but I don’t have any formal training in stats, nor working knowledge of R.
You can post here, or email me (rrenaud@gmail.com) or geekmail me, or post on uservoice.
I’m loving this so far! However, the syntax for the graphs is a little tricky; for example, I haven’t been able to filter the cards that, say, cost $3 and give +$2.
As of now, I believe that the only options for syntax are:
&& and || (and and or operators, with parentheses support)
Cost
Actions
Cards
Action/Treasure/Victory (e.g., Treasure==1 && Victory==1 returns Harem)
We’ll implement more as time passes, and maybe even document it :p
I just nerdgasmed. My university course’s core discipline is operations research, so maximizing the expected winning percentage given a certain set of kingdom cards is right up my alley, and is reason 1.a why I love dominion, and pretty much every game I play (1.b. being “it’s fun”).
I was legitimately scared I would spend more time on this than actual playing… Okay, maybe not, but still, this is pretty cool.
So so so amazing. I love the game search feature, it’s great. And the graphs are awesome. Excited to see what’s to come!
This is so cool.
awesome site
This is so amazing.
Would it be possible to filter your data to only use games where one or maybe both players have a winning lifetime record?
I eventually want to support filtering based on TrueSkill, which is more informative than just a straight up record (eg, someone wins only 45% of his games but only plays league games on BGGDL is a good player).
I can’t tell you how stellar this is. I’ve sifted through some of the data dumps from isotropic with R, but I never made pretty graphs or had this kind of flexibility.
Thanks!!
I love you so much for making this tool. Thank you rrenaud!
Excellent! I can tell already I’m going to spend way too much time looking at these graphs.
FYI, Cellar is listed as a two-action card. And the filters recognize Action, Treasure and Victory, but not Reaction, Duration or Attack. I would be particularly interested in how the attack cards rank against each other.
The data file containing the card attributes is here:
http://dominionstats.googlecode.com/svn/trunk/card_list.csv
If you want to fix it up, add some columns for reaction, duration, attack types, etc, and mail it back to me at rrenaud@gmail.com, I’ll commit it.
This is very cool. Any chance you can filter the “graphs of card advantage against winrate” graph to show only cards that give a + win rate with an advantage?
I’d love to be able to look at one graph that just shows all of the cards that seem to increase win rate when you have more of them (the graphs get too busy to see clearly when you have too many cards on them).
Thanks again for doing this!
See here:
http://councilroom.uservoice.com/forums/99487-general/suggestions/1459199-let-us-filter-by-result-not-card-characteristics?ref=title
But in short,
MeanVarStat(all_card_data.card_stats[Singular].win_any_accum).Mean()%20%3E%201.04
The difference in bought-gained is very cool. However, does it also include the number trashed? It’d be interesting to see just the win-rate versus the card differential at the end of the game. Actually both would be interesting bec you might see some cards that are good early (and thus a bought-gained differential) but then are commonly trashed later in the game (e.g. lookout?).
It doesn’t take into account trashing at all. Even if it did, I suspect the results wouldn’t change very much.
Also, isn’t the bought-gained graph symmetrical across the y-axis? Maybe showing it from [0:*] would give more screen real estate without losing data.
It’s not always symmetrical, due to weird multiplayer effects. Having both sides of the graph is also a little illuminating itself for seeing the negative effects.
It’s interesting that the win-rate for buying copper becomes positive at turn 14. I assume that’s just the gardens effect.
Gardens, Goons, Trade Routes… There are a few things that would make one want to buy coppers late in the game. If you’re buying one then at all, you must have a good reason for it.
Oh, I’m so torn! This is cool, and I would even love to be able to contribute to a project like this, but it’s written in Python! Is this going to finally be the thing that makes me suppress my hatred of Python and try to learn it??
Learning Python could be useful, but you could meaningfully contribute something in a different language. As long as your language of choice has MongoDB bindings, there could be a way to get two different languages playing nicely with each other.
Also, there is a lot of interesting things to be done on the frontend with Javascript.
Are there particular things that you’re looking for people to contribute? I was thinking it might be helpful to just submit small improvements, which would need to be in the existing language, rather than larger modules. (For example, on the player history page, it shows their history vs. specific players, but I was thinking it might be nice if you could see their total history as well. That seems like a small change, one that I’d want to just make to the existing source.)
Small changes are a good way to get your feet wet with the code.
By total history to you mean overall record? That sounds fine to me.
I updated the CouncilRoom.com front page with ideas for potential contributions:
Contributions
I’d also be very happy to have outside contributions to the project. Some of the highest rated ideas can be implemented without having to learn all that much of the code.
Vanity Accomplishments
Graphic design skills to create the badges.
Some creative and fun ideas as to what is badge worthy.
A little bit of programming ability to detect the novel events.
Add Visualizations for individual games: You just need to come up with good ideas as to what is worth seeing and how to display it.
These are indeed amazing to play with!
Just a cautionary note as we drool over these charts… In our enthusiasm we should be careful not to forget the old correlation vs. causation distinction in statistics. It sounds like there are many with stronger math skills than I here who can hopefully clarify this. But I wonder for example when you assert after the first graph that “as demonstrated in the picture above, buying non-terminal +1 Action/+1 Cards is indeed superior to buying Silver”, is that really a valid conclusion based on the corellation established? I don’t know if it is or not, I’m just raising the question…
It is certainly evidence that buying the spammable cards is better overall than buying silvers at the point in the game. It is not conclusive proof of the statement, however. You can come up with a whole bunch of possible worlds in which this graph would be unchanged, but that Silver is still a better buy than village on turn 8. Consider the (exceeding unlikely) possibility that only experts have discovered the value of mid game villages, and they win more than half of their games anyway even when they don’t buy silvers.
On the other hand, it is yet more reason to believe that indeed, you should prefer the spammable actions rather than silvers after your 2nd or 3rd shuffle.
One criteria for establishing causation instead of just association is to provide a plausible (and preferably testable) explanation for what has been observed.
As rrenaud points out this is certainly evidence in favor of the spammable actions. More importantly is the development of a sound theory that explains why this is a reasonable action. Such explanations have been covered in other articles and seem quite valid to me. This just helps support the validity of those theories.
“buying Province starts to be better than buying Gold as early as Turn 7”
This assertion is more in doubt than the +1/+1 vs. Silver assertion.
Players in the Province-gaining set were able to gain an $8 on turn 7. All we know about players in the Gold-gaining set is that they were able to gain a $6 on turn 7. Being able to generate more money (or a larger gain effect) on turn X is predictive of winning, regardless of what you buy.
To really draw this conclusion we would have to be able to limit the Gold-gaining set to folks who could have gained a Province on turn 7 but went for Gold instead.
Huge kudos for setting up the code and the data, and fielding some preliminary conclusions! We’ll get sorted what we can say about what data soon enough.
So starting at turn 15, it is preferable to buy an Estate over a Duchy?
http://councilroom.com/win_weighted_accum_turn.html?cards=Silver,Cost%3D%3D3%20%26%26%20Actions%3E%3D1%20%26%26%20Cards%3E%3D1
I think you mean to make this link?
http://councilroom.com/win_weighted_accum_turn.html?cards=Estate,Duchy
This kind of absolute interpretation of the data is one of dangers of this type of information. As joel88 so right pointed out, correlation is not causation. Just because the graph shows the Estate with a greater winning rate does not mean the Duchy buy is weaker.
There are a couple of problems with interpreting the information. For example, the Duchy win rate suffers from the fact that if you can afford a Duchy you would often have preferred to be able to afford a Province. It is quite possible for the Duchy to be the best possible play available and still have a inferior win rate (your opponent is either often buying a Province and gaining a lead or is buying a Duchy and is negating yours). Whereas if you are buying late game Estates you are typically doing so either as an additional buy to win a tiebreaker, or you are trying to empty the Estates for a VP win, or you are trying to eek out a close victory and really need that last VP.
When comparing two graphs it is quite possible for card A to have a win rate greater than card B yet still have card B be a better purchase than card A. The graphs are useful and very important but not careful interpretation is required.
Here’s a slightly more ridiculous example:
http://councilroom.com/win_weighted_accum_turn.html?cards=Curse,Estate
So yeah. Think about what games the data would be coming from before trying to make any grand generalizations.
A lot of those late game curse gains are +5 VP due to buying embargoed provinces, and it doesn’t count being forced to take a curse from an attack.
I’m curious now how a bot that blindly uses this sort of data would actually play.
Buys are “simple” Just filter by the amount of money the bot has and buy the “best” card on the board for the turn count. Actually playing cards would require similar data for cards played. That would take some work, but should be doable. Play all nonterminals then play terminals in order of statistical winningness. Handling card trashers would require similar data for cards trashed. If forced to discard discard the cards that are the worst to play from the same data set used to prioritize actions. The one thing I can’t see there being a blind statistical solution for is cards with options.
I wonder if it would beat a big money bot or if it would wind up doing stupid things like chapeling all its copper before buying silver and then buying
curses instead of copper.
Maybe you could convince the simulate dominion guys to try to use the data, or use a simulation framework yourself? http://www.boardgamegeek.com/filepage/49502/dominion-simulation-python-framework
The data is available here, and though it looks messy to a human, if you run it through a JSON parser, it becomes pretty clear.
http://councilroom.com/static/output/all_games_card_stats.json
Lots of comments about curses vs copper. Maybe I should remove the embargo curse gains from the data?
The problem with an AI using this data is that it doesn’t consider combinations at all, and its action balance will be very out of whack. Have 4 smithies and no +actions? Another smithy for you!. But I could see a mixed rule based and data based AI doing better than either alone.
Your graphs are great and I don’t think you need to change them. People just need to not use them blindly to draw conclusions. And they are insufficient for programming a bot to play the game. Too much contextual information is missing.
You have already given the Dominion world access to important information for deeper analysis. That analysis isn’t going to be done with your charts and graphs alone.
So I say Bravo! and don’t worry about weird results like the Curse data. Your only mistake so far was to write an introductory article drawing conclusions from the graphs only to run into people pointing out the severe limitations of any such claim.
Very interesting, well done!
One of the first ones I searched was Duchy vs. Gold. I think it’s pretty tough to choose when to start buying up duchies on 7 coins. Looks like Duchy is always a slightly suboptimal buy but close to unity. I guess people who get 5 instead of 8 just lose to the people who get 8 each turn? Interesting to see gold slope off after turn 13 though, seems about right.
Wow, this info is so fantastic. I really am enjoying looking at these graphs.
I do, however, have a question. Maybe I “man-looked” buy I can not find how you define “win rate.” My question is how is it possible to have a win rate greater than 1?
Win rates are a fair generalization of winning probability across variable player sized games. You get n win points for winning a n player game. The win rate of an event is the average number of win points for that event. For a 2 player game, winning percentage = win rate / 2.
As long as we’re asking possibly dumb (sorry, no offence, Fbomb!) questions for us non-graphgeeks, what do the N-S lines extending from each turn-node represent?
They show you how sure I am of the data. The larger the bars, the less certain the measure. They are +- 2 standard deviation error bars. Basically, with 95% confidence, the true value of the statistic lies within those two error bars. So you can see in the silver graph, for example, that the line starts to go a bit crazy towards the right, and the error bars get big. There aren’t many games in which a player buys silver on say, turn 33, so the graph is trying to be honest with you and admit that it doesn’t have a very precise estimate for the chance they will win.
http://www.quora.com/What-is-a-confidence-interval-in-laymans-terms
Don’t put too much emphasis on the quality of those confidence bars.
Using a 2 standard deviation bar is roughly equivalent to generating a 95% confidence interval if several assumptions are true and the standard deviation that is being calculated is actually the standard error. And with 30+ bars some adjustment for family-wise error rate is appropriate.
So just view the bars as telling you how much information actually exists about the true win rate.
Familywise correction meaning that the trend around points on a given line gives information about its neighbors?
Then the confidence intervals are actually larger than the need to be for 95% confidence, right?
For example on the silver graph, for turns 5 through 15, the range for each each point is definitely inside a line drawn between the previous and next neighbor.
Edit: No, from my reading of wikipedia, familywise error rates are correcting for making multiple observations. If a single observation is wrong 5% of the time, but you made 50 observations, the chance that you are wrong somewhere is pretty damn high.
I am just being very pedantic here. Don’t mind me.
With 30 plus observations it is likely that at least 1 (and probably 2) of the bars do not contain the true win rate. This says nothing about how wrong they are (usually not very much) or whether you even care.
As long as you don’t oversell what the bars are saying I don’t see why you would want to use anything more complex.
Not sure if this will get noticed here, but it was the best location I could find. I was looking at one of of my games and noticed a bug. In the deck summary that floats at the bottom of the page, my opponents treasures that I trashed with Pirate Ship showed up as trashed from my deck instead.
http://councilroom.com/game?game_id=game-20110414-180552-539e4371.html
Yeah, it’s a known issue. I am mid way through a rewrite of the game log parser. It’s going to be much more general and less hacky. I’ll make bugs like that never happen by simulating my recorded data and making sure it matches the final game state in the logs.