Progress report on stuff!
Once again, hopefully for the last time this time, the dataset structure was redesigned. The query language remained the same, but now everything lives together - single cards, combos, various synergies pointed on a per-hit basis. After consulting with Feyd and pokken, the following column set was agreed on:
Synergy Target Points Commander_Points Tags
The commander points column only comes into effect if one of the cards is the deck's commander. It makes sense to have this split,
Feather with
Bandage is great with her at the helm but considerably less great if it's a synergy in the 99. And this way it just takes a single row in the dataset, plus it's good for boring reasons that will be explained later. Tags are manually curated flags for stuff, so Feather + Bandage would get tagged as draw.
At this point, given the hopeful finality of the dataset design, it'd be great to open the floor to various outside contributions. The other guys are working on some sort of putative ingester of synergies in a format friendlier than just keying in JSON, but I think the existing system is perfectly legible and can be typed by hand quite easily. That said, as its creator, I may be a biased
- A dataset entry is made up of a query, and optionally a target. An example of a query without a target are giving points for a Mana Crypt, or various Kiki combos. An example of a query with a target are Monastery Mentor plus noncreatures, or letting draw and ramp tags amplify each other.
- The system currently supports querying/targeting by: {"card", "tag", "type", "subtype", "color", "CMC", "text"}
- If using cards as a query condition, all of the cards should be part of the query and none in the target. This makes things run faster, and if you're checking Kiki against whichever one of his friends then you'll still need to hit both of them.
- The query/target are JSON formatted, without line breaks. The formatting is pretty intuitive, check out the current iteration of the TSV to get a feel for it:
- Cards and tags are mutually exclusive with anything else - if you have a card query, that's that. If you check for tags, that's that. But the other stuff can be combined into complex queries, as once again evidenced by the immortal Sunforger example:
{"card":["sunforger"]} {"type":["instant"],"cmc":["<5"],"color":["r|w"]} 2 0
- If providing multiple criteria for the target, they will be evaluated in an AND fashion. So all the text, type etc. criteria will have to hold true. There's OR support via the | symbol, as shown above - instants that are white or red will clear the hurdle. There's also negation, achieved by starting a phrase to search with !, as shown here with Monastery Mentor:
{"card":["monastery mentor"]} {"type":["!creature","!land"]} 1 0
In a hypothetical world, the synergy dataset could grow to thousands of entries. It becomes critical to chew through that stuff quickly if an EDHREC-sized calibration is in the cards, and pretty much the only stuff that shows up in queries is cards or tags. As such, having the deck store its identified cards/tags as sets, having queries expose their desired cards/tags as sets too, and doing quick .issubset() operations is a good way to coax out good performance in discriminating which of the synergies apply to the deck. Once the query clears, the target is still evaluated using the old logic if needed, as there's complexity to it that needs to be handled. Still, that's a reasonable sacrifice to make - I've managed to get Feather with her sea of synergies to clear a fakely inflated 40k card synergy dataset in 10 ms, as the bulk of the needed speed-up comes from dismissing the initial criteria, not refining the subsequent hits.
The scoring got split up into two stages - previously, tags could look to single-card data to get there, but now need to wait for all non-tag synergies to complete. This makes sense, as mentioned Feather plus Bandage is draw. Plus the dataset also handles single card points at the same time now. So the queries with tags wait for the non-tag synergies to go through and apply tags, creating a data frame with one row per identified synergy and tags ripe for the picking, and then .issubset() themselves against that column. There's also the initial check against a master union of all the deck's identified synergy tags to similarly weed out any tag queries that won't resolve in the interest of run time.