Automated Commander Power Level Evaluation

Rumpy5897 · Post by **Rumpy5897** » 4 years ago

Sure, here's a stab at it. Setting k to a negative value turns the model into logistic decay, which fits the situation better - the higher the CMC, the lower the multiplier. The further from 0 the k the more steep the curve, while x0 is the point at which the function reaches half its maximum value. I used 1 in the numerator, so at x0 the proposed multiplier is 0.5. You can mess with this as you see fit. I went with the mid-point at a CMC of 3 as it makes sense to me conceptually - it's the tipping point between CMC working for you and against you

This helpful enough, or want some other form?

import matplotlib.pyplot as plt
import numpy as np

k=-1
x0=3
plt.plot(x, 1/(1+np.exp(k*(x0-x))))

3drinks · Post by **3drinks** » 4 years ago

pokken wrote: ↑
4 years ago

3drinks wrote: ↑
4 years ago
No idea how to use this, but feel free to run my Kari Zev through this. I suspect it should rank high power, though I have the cards to add in at least Dualcaster/Twinflame if my meta called for such a strategy.
edit: made some updates to the curve multiplier that changed the end total slightly.
+--------------------------------------------------+
|           3Drinks_Kairi-Sane -- 272.42           |
+-----------------------------+-----------+--------+
| Card Name                   | Commander | Points |
+-----------------------------+-----------+--------+
| KARI ZEV, SKYSHIP RAIDER    | Yes       |      0 |
| MANA CRYPT                  | No        |    100 |
| SOL RING                    | No        |     85 |
| ANCIENT TOMB                | No        |     20 |
| PURPHOROS, GOD OF THE FORGE | No        |     15 |
| SKULLCLAMP                  | No        |     12 |
| SCALDING TARN               | No        |     10 |
| ARID MESA                   | No        |      5 |
| BLOODSTAINED MIRE           | No        |      5 |
| WOODED FOOTHILLS            | No        |      5 |
+-----------------------------+-----------+--------+
Scores in the 250s usually mean you're playing some good mana acceleration. Looking at your deck it's likely that the payoffs are not being assessed quite correctly since it looks fairly strong to me.

Cards like dreadhorde arcanist and sword of feast and famine need to have points eventually, and I have not gotten to categorizing the mid power cards yet. Hopefully get some voluneers soon

It is strong, and not just for it's mana. What do you mean the payoffs aren't assessed correctly? Did it calc the whole list, if so why is gauntlet of might not listed? Does it just not list cards that score a 0 next to them?

Post by **pokken** » 4 years ago

Rumpy5897 wrote: ↑
4 years ago
Sure, here's a stab at it. Setting k to a negative value turns the model into logistic decay, which fits the situation better - the higher the CMC, the lower the multiplier. The further from 0 the k the more steep the curve, while x0 is the point at which the function reaches half its maximum value. I used 1 in the numerator, so at x0 the proposed multiplier is 0.5. You can mess with this as you see fit. I went with the mid-point at a CMC of 3 as it makes sense to me conceptually - it's the tipping point between CMC working for you and against you This helpful enough, or want some other form?
import matplotlib.pyplot as plt
import numpy as np

k=-1
x0=3
plt.plot(x, 1/(1+np.exp(k*(x0-x))))

I've never used numpy before so I'm a bit fuzzy on how to leverage that into a formula I can use to calculate a multiplier -- and I'd love to avoid using an additional module if I can to keep it simple.

My current code looks like:

   max_curve_mult = 2
    min_curve_mult = 0.5
    offset_curve_mult = -0.25 

    ## calculate the curve multiplier
    def get_curve_mult(self):
        mult = round(1/math.log(self.offset_curve_mult + self.get_average_cmc()),2)
        if mult > self.max_curve_mult:
            return  self.max_curve_mult
        elif mult < self.min_curve_mult:
            return self.min_curve_mult
        else:    
            return mult

I guess what I'm not understanding is what:

np.exp(-1 ( 3 - X) ) means exactly.

The "exponential of all elements in the input array" is confusing

Sorry for being obtuse there, just never used numpy

Can you reduce it to a mathematical formula in vanilla python you think?

Post by **pokken** » 4 years ago

3drinks wrote: ↑
4 years ago

It is strong, and not just for it's mana. What do you mean the payoffs aren't assessed correctly? Did it calc the whole list, if so why is gauntlet of might not listed? Does it just not list cards that score a 0 next to them?

If a card isn't listed it's 0.

The payoffs for the good mana aren't being assessed because I don't have them documented yet; things like inferno titan + tobran, thane of red fell are not something currently being factored in.

I think I also missed gamble and a bunch of strong interaction pieces like red elemental blast.

Bottom line this deck is a good one to analyze since it's got a bunch of strong cards I haven't put in the list yet, and also some synergies.

Long way to go at getting the card values together

Rumpy5897 · Post by **Rumpy5897** » 4 years ago

exp() is e to the power of whatever is the argument. Your code snippet uses math, so I'll use that too - vanilla Python does not come with it, but it's extremely common in packages. Something like this will do the trick, just a basic one-line function that ingests your computed average CMC and spits out a candidate multiplier. Feel free to tinker as desired, of course.

def cmc_multiplier(cmc, k=-1, x0=3):
	return 1/(1+math.exp(k*(x0-cmc)))

Interesting how we come from different programming realms. I see your fancy class definitions and some primal part of me starts looking for the door. I never got good at object stuff.

Post by **pokken** » 4 years ago

Rumpy5897 wrote: ↑
4 years ago
exp() is e to the power of whatever is the argument. Your code snippet uses math, so I'll use that too - vanilla Python does not come with it, but it's extremely common in packages. Something like this will do the trick, just a basic one-line function that ingests your computed average CMC and spits out a candidate multiplier. Feel free to tinker as desired, of course.
def cmc_multiplier(cmc, k=-1, x0=3):
	return 1/(1+math.exp(k*(x0-cmc)))
Interesting how we come from different programming realms. I see your fancy class definitions and some primal part of me starts looking for the door. I never got good at object stuff.

That should do it. Math is part of the download of python generally, you do have to import it. Not opposed to imports just wanna avoid having to learn a bunch of new external libraries and have them as a dependency when I compile it

My background is primarily C# and Java, and mostly web design type stuff. I picked up Python to do a website and loved it, but yeah I'm very, er, classy. My math stopped at Calc 2 and it's 20 years untouched.

I'll work on getting this into the code this evening.

Post by **pokken** » 4 years ago

Kenrith, the Returned King	Commander	300
Tymna the Weaver	Commander	200
Niv-Mizzet Reborn	Commander	150
Thrasios, Triton Hero	Commander	175
Najeela, the blade-bossom	Commander	150
The First Sliver	Commander	125
Ephara, God of the Polis	Commander	50
Animar, Soul of Elements	Commander	90
Maelstrom Wanderer	Commander	80
Karametra, God of the Harvest	Commander	45
Feather, the Redeemed	Commander	60
Zada, Hedron Grinder	Commander	40
Krenko, Mob Boss	Commander	60
Niv-Mizzet, Parun	Commander	80
Kiki-Jiki, mirror breaker	Commander	70
Purphoros, God of the Forge	Commander	70
Zur the Enchanter	Commander	100
Golos, Tireless Pilgrim	Commander	90
Teferi, Temporal Archmage	Commander	125

If someone wants to take that list of commander points and just go ham on rating every commander and correcting my ratings I would be most obliged. The biggest thing this list is missing is that there are a crapload of 50 point commanders that see play, and the second is that my points are horrible. build around commanders like zada, hedron grinder probably need to be ranked much higher because they make up so much of a deck's power.

Ambitiously I would love to have like, the top 200 commanders at least, more like 500.

Just cranking through this list would be super helpful:
https://www.mtggoldfish.com/metagame/co ... full#paper

plushpenguin · Post by **plushpenguin** » 4 years ago

Here's a test. I don't know where cards like Utopia Sprawl, Mirri's Guile, and Collector Ouphe stand yet, but well... there's just a lot of cards. I run a lot of... unconventional but effective stuff in this list, so I'm curious as to what score it gets.

plush_xen_bm.txt

hyalopterouslemur · Post by **hyalopterouslemur** » 4 years ago

darrenhabib wrote: ↑
4 years ago
I do feel like your target for everything at the moment is "how much of my deck has some cEDH staples?".
And in that respect its never going to be a true "rate my deck".

Basically. In reality, automated systems have trouble with large sets of options. What humans do is, we discount the obvious bad options, then do the same with randomly selected remaining options until we get to what we want.

What this app does is tries to say what the most powerful cards are. But then you have to figure in synergy, which brings up "Is this worth it?" (Why token players are willing to run three of the mutation cycle but not that one. Why lifegain players run Absorb but for everyone else it's just another Cancel.) Synergy includes combos. (A Nekusar deck will run Bloodchief Ascension, which basically says "anything which can mill 20 is now good".) But you also have to figure out mana cost, which is, on its own, something you want low, but what are you willing to sacrifice for a cheap card? If you're in 3+color, are you willing to play a triple-colored mana cost? One color leaves blind spots, but each additional color increases your risk of color screw. And if you play a lot of dual lands, say hello to my pals Blood Moon and Back to Basics.

Post by **pokken** » 4 years ago

plushpenguin wrote: ↑
4 years ago
Here's a test. I don't know where cards like Utopia Sprawl, Mirri's Guile, and Collector Ouphe stand yet, but well... there's just a lot of cards. I run a lot of... unconventional but effective stuff in this list, so I'm curious as to what score it gets.

plush_xen_bm.txt

Bear in mind I am cramming the first draft of the new curve algorithm in there over a few minutes at lunch, so all the scores are going to be different, can see the ref post for comparisons.

+---------------------------------------------------+
|  Plushpenguin_xen_bm Raw:502 Adj:205.09  |
+---------------------------+-------------+---------+
| Card Name                 | Commander   |  Points |
+---------------------------+-------------+---------+
| XENAGOS, GOD OF REVELS    | Yes         |       0 |
| MANA CRYPT                | No          |     100 |
| SOL RING                  | No          |      85 |
| CARPET OF FLOWERS         | No          |      25 |
| DOCKSIDE EXTORTIONIST     | No          |      25 |
| WHEEL OF FORTUNE          | No          |      25 |
| ANCIENT TOMB              | No          |      20 |
| GAMBLE                    | No          |      20 |
| SYLVAN LIBRARY            | No          |      20 |
| WILD GROWTH               | No          |      20 |
| BIRDS OF PARADISE         | No          |      15 |
| WORLDLY TUTOR             | No          |      15 |
| COMMAND TOWER             | No          |      10 |
| CROP ROTATION             | No          |      10 |
| FINALE OF DEVASTATION     | No          |      10 |
| MISTY RAINFOREST          | No          |      10 |
| SCALDING TARN             | No          |      10 |
| SENSEI'S DIVINING TOP     | No          |      10 |
| BLOOD MOON                | No          |       8 |
| ARID MESA                 | No          |       5 |
| BEAST WITHIN              | No          |       5 |
| BLOODSTAINED MIRE         | No          |       5 |
| CHAOS WARP                | No          |       5 |
| GREEN SUN'S ZENITH        | No          |       5 |
| PYROBLAST                 | No          |       5 |
| RED ELEMENTAL BLAST       | No          |       5 |
| VEIL OF SUMMER            | No          |       5 |
| VERDANT CATACOMBS         | No          |       5 |
| WINDSWEPT HEATH           | No          |       5 |
| WOODED FOOTHILLS          | No          |       5 |
| FORCE OF VIGOR            | No          |       3 |
| MAGUS OF THE MOON         | No          |       3 |
| TAIGA                     | No          |       2 |
| NATURE'S LORE             | No          |       1 |
+---------------------------+-------------+---------+

Post by **pokken** » 4 years ago

hyalopterouslemur wrote: ↑
4 years ago

darrenhabib wrote: ↑
4 years ago
I do feel like your target for everything at the moment is "how much of my deck has some cEDH staples?".
And in that respect its never going to be a true "rate my deck".
Basically. In reality, automated systems have trouble with large sets of options. What humans do is, we discount the obvious bad options, then do the same with randomly selected remaining options until we get to what we want.

What this app does is tries to say what the most powerful cards are. But then you have to figure in synergy, which brings up "Is this worth it?" (Why token players are willing to run three of the mutation cycle but not that one. Why lifegain players run Absorb but for everyone else it's just another Cancel.) Synergy includes combos. (A Nekusar deck will run Bloodchief Ascension, which basically says "anything which can mill 20 is now good".) But you also have to figure out mana cost, which is, on its own, something you want low, but what are you willing to sacrifice for a cheap card? If you're in 3+color, are you willing to play a triple-colored mana cost? One color leaves blind spots, but each additional color increases your risk of color screw. And if you play a lot of dual lands, say hello to my pals Blood Moon and Back to Basics.

The thing to remember here is that most themes are weak. Detecting if your elf deck is playing good elves vs. a bunch of elves isn't going to change much vs. knowing that you have craterhoof behemoth and finale of devastation in your deck, or ezuri, renegade leader is your commander.

It's also pretty feasible to make assumptions about a deck and be correct 80% of the time -- for example:
* Deck has aetherflux reservoir - it doesn't matter if it's a lifegain deck or a storm deck, the card is a powerful payoff and we can just assume people are using it at least somewhat well. If we're wrong...we're wrong, but it's not going to be that often someone just puts it in there for no reason.

* Deck has coat of arms in it; we can almost surely assume they are trying to do something good with it, and give them some number of points.

One thing I haven't explored is giving a healthy number of points to all the "payoff" cards and just assuming they will be pretty good. There aren't *that* many good payoff cards for the various strategies that aren't awful.

I could give you 50 points for playing astral slide and just assume you're doing something decent with it without checking your cycling count, and be more accurate than not.

It doesn't matter if I count your cycling cards and give you 1 pt per cycling card because you have astral slide in there and maybe your cycling deck should have 38 points instead of 50. I've just got to be close

Same with serra's sanctum and enchantments. I can just assume it's 80 points and if your deck is only making 50 points of value out of it, it's not really *that* big of a deal.

Rumpy5897 · Post by **Rumpy5897** » 4 years ago

Yeah, I can see why you capped your version. Some of the decks (for example the Wanderer piles) feel like they're being overtly penalised by the current system. Would need a ramp multiplier, which in turn opens the whole utility multiplier can of worms again. Also, interestingly, I just noticed 4cEdric scoring super high. That deck does nothing. It's packed to the gills with goodstuff, but has unbelievable trouble closing.

Post by **pokken** » 4 years ago

Rumpy5897 wrote: ↑
4 years ago
Yeah, I can see why you capped your version. Some of the decks (for example the Wanderer piles) feel like they're being overtly penalised by the current system. Also, interestingly, I just noticed 4cEdric scoring super high. That deck does nothing. It's packed to the gills with goodstuff, but has unbelievable trouble closing.

I wound up adding a 0.2 floor to yours as well because the couple weird outliers (like that super high CMC golos deck that got monstrous penalties).

The curve system is not perfect but I think with your logistic growth curve it's closer to reality than not.

Long term I would envision some sort of ramp assessment that grades your curve on a curve by how much you're ramping and normalizes it. Say if a huge percentage of your deck points are ramp then you get less of a reduction.

I think tutor quotient is the next algorithm I want to work on though so I'll probably do that before mucking with curves. I need to get way more tutors into the points db though.

re: 4c-edric

It may have a hard time closing but that deck probably destroys mid-powered decks. It has triumph and craterhoof to close, a great mana base, and tons of powerful removal and so on. I think its rating is probably closer to accurate than not.

I can tell you from experience that tymna the weaver plus a good removal suite will win games on its own, no matter what pile of crap you put in your deck

+-------------------------------------------------+
|            4cEdric Raw:729 Adj:474.01           |
+----------------------------+-----------+--------+
| Card Name                  | Commander | Points |
+----------------------------+-----------+--------+
| TYMNA THE WEAVER           | Yes       |    200 |
| KYDELE, CHOSEN OF KRUPHIX  | Yes       |      0 |
| MANA CRYPT                 | No        |    100 |
| SOL RING                   | No        |     85 |
| VAMPIRIC TUTOR             | No        |     60 |
| DEMONIC TUTOR              | No        |     60 |
| DARK CONFIDANT             | No        |     15 |
| BIRDS OF PARADISE          | No        |     15 |
| AVACYN'S PILGRIM           | No        |     15 |
| WORLDLY TUTOR              | No        |     15 |
| CITY OF BRASS              | No        |     12 |
| MANA CONFLUENCE            | No        |     12 |
| ELADAMRI'S CALL            | No        |     10 |
| ELVISH MYSTIC              | No        |     10 |
| FYNDHORN ELVES             | No        |     10 |
| COMMAND TOWER              | No        |     10 |
| FLOODED STRAND             | No        |     10 |
| MISTY RAINFOREST           | No        |     10 |
| POLLUTED DELTA             | No        |     10 |
| SCALDING TARN              | No        |     10 |
| TOXIC DELUGE               | No        |      5 |
| SWORDS TO PLOWSHARES       | No        |      5 |
| PATH TO EXILE              | No        |      5 |
| LINVALA, KEEPER OF SILENCE | No        |      5 |
| BURGEONING                 | No        |      5 |
| BEAST WITHIN               | No        |      5 |
| ARID MESA                  | No        |      5 |
| BLOODSTAINED MIRE          | No        |      5 |
| MARSH FLATS                | No        |      5 |
| VERDANT CATACOMBS          | No        |      5 |
| WINDSWEPT HEATH            | No        |      5 |
| WOODED FOOTHILLS           | No        |      5 |
+----------------------------+-----------+--------+

Rumpy5897 · Post by **Rumpy5897** » 4 years ago

The "pile of removal" aspect was a leftover of MarduSkies. I learned what you just said pretty quickly - Tymna plus evasive tickles equals draws for days, equals trigger-happy hurling of spot removal. Still, that deck closed pretty easily with the Bruse side of things. This deck never really did anything by comparison.

The logistic curve function could still hold for tutoring, experiment with the k and x0 to get whatever you're after shape-wise. Will probably need a low positive value for the k to capture the dynamics as the X axis will be points, and as a result have wider range.

Rorseph · Post by **Rorseph** » 4 years ago

@Rumpy5897 and @pokken - Thanks for putting your nuts and bolts conversation around the Python. As a Python neophyte who works part of their time in data analysis, it has been very enlightening!

So, out of curiosity, where do we think the numbers map to a "1-10" Scale at this point? Or is it not something we can pin down yet?

plushpenguin · Post by **plushpenguin** » 4 years ago

Yeah, you're going to need to assume to some degree that players who put stuff into their decks have the required support to make it work.

The algorithm will always have issues with say.. people who put Survival of the Fittest into a deck with 10 creatures, but not a whole lot you can do about it. It also doesn't take into account the choosing of options that are "slightly worse overall" but better in that specific deck. Good luck trying to input a formula for that however. You'll be doing this for days if you go that far.

For example, Nature's Lore is probably slightly worse than a talisman (costs a green) but you want it over the talisman if you're on Null Rod.

Post by **pokken** » 4 years ago

Rorseph wrote: ↑
4 years ago
Rumpy5897 and pokken - Thanks for putting your nuts and bolts conversation around the Python. As a Python neophyte who works part of their time in data analysis, it has been very enlightening!

So, out of curiosity, where do we think the numbers map to a "1-10" Scale at this point? Or is it not something we can pin down yet?

Bearing in mind that there's a HUGE gap in card point assignments and that numbers are changing regularly because of changes in the multiplier, with the current list:

>1000 = CEDH (10)
500-999 = 8-9
150-500 seems to encompass the 5 to 7 range but there's the most error down here.

Right now if a deck is over 4-500 you can bet it's probably pretty good (it requires a lot of good cards and a low curve to get there).

There are too many missing mid-power cards like craterhoof behemoth and fauna shaman and what not. And most egregiously missing commanders like Brago drive a lot of scores down.

plushpenguin wrote: ↑
4 years ago
Yeah, you're going to need to assume to some degree that players who put stuff into their decks have the required support to make it work.

The algorithm will always have issues with say.. people who put Survival of the Fittest into a deck with 10 creatures, but not a whole lot you can do about it. It also doesn't take into account the choosing of options that are "slightly worse overall" but better in that specific deck. Good luck trying to input a formula for that however. You'll be doing this for days if you go that far.

For example, Nature's Lore is probably slightly worse than a talisman (costs a green) but you want it over the talisman if you're on Null Rod.

Yeah, I feel like we can just assume a certain amount of baseline competence and coherency. I don't really have much interest in trying to analyze Chair Tribal and overrating it because they put a bunch of high powered chair cards in there or whatever.

Outliers gonna outly.

Post by **pokken** » 4 years ago

I'm going to start a short dataset action item log in the OP and if you have interest in attempting one of those tasks please post in the thread or PM me and I'll assign it to you.

* Gather list of missing commanders of scorable power level and draft scores
* Gather list of missing tutors of scorable power level including the niche ones like Stoneforge Mystic, and draft scores
* Update the list of Fast Mana and adjust all the points so that Cultivate is 1 and Mana Crypt is 100, moving points around wherever needed.
* Analyze any number of commanders' scorable synergy cards with point estimates, using my Ephara example as a decent starting point.
* Add two-card combos / strong synergies to the list and give draft points.

Rumpy5897 · Post by **Rumpy5897** » 4 years ago

Yeah, this will definitely need to ingest a bunch of points for things. I generated reports for my meta decks to take a look at what was happening, and it's essentially a case of "I run fancy lands and sometimes dumb rocks and the rest of the group does not". The existing points will also need some adjustment, I don't really feel Mana Confluence (12) brings more to the table than a Cyclonic Rift or Finale of Devastation (both 10).

Easiest thing to do with reports would be to make a report directory and barf them there for later ingesting.

Post by **pokken** » 4 years ago

Rumpy5897 wrote: ↑
4 years ago
Yeah, this will definitely need to ingest a bunch of points for things. I generated reports for my meta decks to take a look at what was happening, and it's essentially a case of "I run fancy lands and sometimes dumb rocks and the rest of the group does not". The existing points will also need some adjustment, I don't really feel Mana Confluence (12) brings more to the table than a Cyclonic Rift or Finale of Devastation (both 10).

Easiest thing to do with reports would be to make a report directory and barf them there for later ingesting.

Manabase quality is one of the hardest things to estimate the impact of. It's kinda like tutors. What you can cast consistently is, like what cards you have to tutor for, a big determiner of how important a manabase is. Those numbers kind of assume that if you're playing a good manabase you will have a reason for it, but that might be hugely inaccurate.

That said I did err on the low side of the spectrum for most things. I think upping the points of the interactive and cards capable of winning the game (e.g. finale) has to be in the cards.

If you want an edit link PM me your email and you can go nuts on it. I really am not attached to any of the point scores except trying to keep a ceiling of 100 so there's enough room for differentiation but playing a mana crypt doesn't completely skew things

Rumpy5897 · Post by **Rumpy5897** » 4 years ago

I'm afraid of the responsibility of point values. Note I've never once made even a slight pass at it, just offering general ideas. I'll keep you posted if I come around to it.

Post by **pokken** » 4 years ago

Rumpy5897 wrote: ↑
4 years ago
I'm afraid of the responsibility of point values. Note I've never once made even a slight pass at it, just offering general ideas. I'll keep you posted if I come around to it.

If you want to take a swing at writing some algorithms, the 'deck' interface is very unlikely to change at this point, we really need to keep commanders and cards in a separate dictionary to organize them cleanly so a deck should always have cards and commanders, and that's largely what any algorithms are going to be concerned with.

If you have requests for values to be included in the card import (e.g. oracle text, colors) lmk and I can surely do that. Trying to keep it as slim as possible hence not adding all fields, but can do anything needed.

I can also move the deck parser logic out of the module and something you could import if that's helpful.

Dragonlover · Post by **Dragonlover** » 4 years ago

Feel free to run my lists through it, do you need me to convert them to text files?

Dragonlover

Post by **pokken** » 4 years ago

Dragonlover wrote: ↑
4 years ago
Feel free to run my lists through it, do you need me to convert them to text files?

Dragonlover

Please do put them in text format, and ideally with your name in the filename, happy to add them but the effort adds up if I go manually scrape everyone's lists.

Thanks much.

Rumpy5897 · Post by **Rumpy5897** » 4 years ago

With the robustness introduced to the parser after darrenhabib's lists, it's just a case of literally copying the innards of the [deck] tag from any threads into a text file and tossing it here as an attachment. Practical!

Your Community

Top Sections

Deck Formats

Cards & Previews

Delve Further

Community Content

Game Tools

Site Tools

Other Resources

Automated Commander Power Level Evaluation

Homunculi? No, Homuncul-US!

Artisan Brewery: Jumping the Gun, Are We?

Artisan Saloon: Doc's Our Huckleberry

Thunder Junction: The Good, the Bad, the Ugly

Collection Tracker