Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

efficient schema design [poké]

Name: Anonymous 2010-06-14 17:07

Hello /prog/, I am a Pokétrainer-turned-businessman. I run a Pokébusiness where I license access to Pokémon via PC.

A big part of my business's success is that I license Pokémon individually, but also by package deal. For example, I have a Grass family package, a Fire family package, and so on. In fact, in my system, all my Pokémon are hierarchically organized this way (and in case you're wondering, with dual-type Pokémon, I only go by the first type). So if a customer licenses one Charizard, the system understands that the customer is actually singling out a Pokémon from the Fire family. Pokémon (category) -> Fire (subcategory) -> Charizard (product). It's a multi-key index!

So I need to track my Pokélicenses in my RDBMS, but I'm having trouble designing the schema correctly, as I'm unsure about unnecessary repetition of data. The thing is, my Fire family may increase in the future, as I go on to catch more Pokémon and add them to my company's collection. (I have people out hunting for Magmar at Pokémon Mansion as we speak!) But if I add a Magmar to the Fire family, that doesn't mean that everyone who has purchased the Fire family package deal beforehand suddenly gets a license to choose Magmar. No sir! That means when tracking Pokélicenses, jotting down just the Family is not sufficient when handling package deals (category -> subcategory). I need to write down what was in the Family when they bought it (category -> subcategory -> product)!


+--------------+
| licenses     |
+--------------+
| id           |
| client_id    |
| created      |
+--------------+

+----------------+
| license_items  |
+----------------+
| id             |
| license_id     |
| category_id    |
| subcategory_id |
| product_id     | < NOT NULL!
+----------------+


Like so. So a license might look like this:



id   client_id  created
---  ---------  -------------------
121  1031       2009-11-12 12:07:39

id  license_id  category_id  subcategory_id  product_id 
--  ----------  -----------  --------------  ----------
1   121         1 (Pokémon)  2 (Grass)       1 (Bellsprout)
2   121         1            2               2 (Bulbasaur)
3   121         1            2               3 (Ivysaur)
4   121         1            2               4 (Oddish)


Well here's the problem. As I said, the Family deals are the biggest reason I'm in business. So I imagine that there will be a lot of licenses where the same entries, in the same order, will appear over and over -- until I add to the Family, at which point it'll be mostly the same, except with one more line. The database will grow quickly with a lot of redundant info.

But is that actually a problem? I'm obsessed with cutting down redundancy, but I fear that I'm actually worrying too much. Also, if there is a better way, I'm not sure what it is. I considered another table, but it quickly became an abomination responsible for tracking subtle permutations between Families and various collections of Pokémon. What a nightmare!

So, any ideas? Am I on the right Poképath, or am I being denormalized?!?

Oh, and excuse me if you feel I've been verbose in explaining this issue. I'm a friendly businessman, you see. Cheers!

Name: Anonymous 2010-06-14 17:19

+----------------+
| pokemon        |
+----------------+
| id             |
| category_id    |
| subcategory_id |
+----------------+

+----------------+
| license_items  |
+----------------+
| id             |
| license_id     |
| poke_id        |
+----------------+

and a join would look better to me.

Name: Anonymous 2010-06-14 17:29

Inefficient due to the fact that you deal only Pokemon, leading to only one main category.

Name: Anonymous 2010-06-14 17:38

>>3

Ah yes, I do already have a Pokemon table.


+----------------+-----------------------------+
| products       | type                        |
+----------------+-----------------------------+
| category_id    | INT(3) UNSIGNED PRIMARY KEY |
| subcategory_id | INT(3) UNSIGNED PRIMARY KEY |
| product_id     | INT(3) UNSIGNED PRIMARY KEY |
| name           | VARCHAR(100)                |
| base_price     | DECIMAL(8, 2)               |
+----------------+-----------------------------+


So, the reason for category_id and subcategory_id being in the license_items table is that all the keys in the multi-key index are necessary to identify a given Pokemon.


category_id  subcategory_id  product_id    name          base_price
-----------  --------------  ----------    ------------  ----------
1            1               1             "Charmander"  15.00
1            1               2             "Charizard"   30.00
1            2               1             "Bellsprout"  10.00
1            2               2             "Bulbasaur"   15.00
1            2               3             "Ivysaur"     20.00
1            2               4             "Oddish"      10.00



Not sure if this helps!

Name: Anonymous 2010-06-14 17:40

>>3
Well, we also sell Pokéthemed t-shirts! And stickers! (categories 2 and 3 respectively.) But those don't get licensed. There's a field within the categories table that differentiates between electronic (licensed) and physical goods.

Name: Anonymous 2010-06-14 18:22

>>4
Rather than continue criticism of the mono-categorization system you have implemented, I might point out that your business savvy looks pretty fucking terrible from my viewpoint if you're offering an Ivysaur at only 5 credits above the base price you charge for its previous incarnation.

Name: Anonymous 2010-06-14 18:39

>>1
with dual-type Pokémon, I only go by the first type
Half of the problem stems from this abysmal decision.

+--------------+
| element      |
+--------------+
| id           |
| name         |
+--------------+

+--------------+
| pokemon      |
+--------------+
| id           |
| name         |
+--------------+

+--------------+
| pokemontypes |
+--------------+
| id           |
| pokemon_id   |
| element_id   |
+--------------+

INSERT INTO element (id, name) VALUES(1, "fire"),(2, "water"),(3, "flying"),(4, "grass");
INSERT INTO pokemon (id, name) VALUES(1, "charizard");
INSERT INTO pokemontypes (pokemon_id, element_id) VALUES(1, 1),(1,3);

As for the licenses, depending on the requirements (they aren't specified that well) you could go with either a pretty simple, or a flexible solution. For the simple solution-

+--------------+
| pokemon      |
+--------------+
| id           |
| name         |
| owner        |
| date_added   | <-- date added to store (timestamp)
+--------------+

+--------------+
| license      |
+--------------+
| id           |
| player_id    |
| element_id   |
| date_bought  | <-- date license bought (timestamp)
+--------------+

Given a player id, find the names of pokemon they own:

SELECT distinct pokemon.name AS name FROM pokemon LEFT JOIN pokemontypes ON pokemon.id = pokemontypes.pokemon_id LEFT JOIN license ON pokemontypes.element_id = license.element_id WHERE license.player_id = %PLAYERID AND pokemon.date_added <= license.date_bought;

Name: Anonymous 2010-06-14 18:49

>>7
I think you are slightly confused. >>1-san deals with customers, not players.
But yeah, I'd suggest creating a timestamp too.
OP could even go a bit further and instead store changes to the Pokémon inventory, as in:

+----------------+
| pokemon_entry  |
+----------------+
| (species data) |
| date           |
| added/deleted  |
+----------------+

This is more complex because it would require "playing back" all the changes to find the exact contents of the package the customer licensed, but it would make the system much more robust and ready to help >>1-san leverage his core skillsets and expand his Pokébusiness.

Name: Anonymous 2010-06-14 20:43

>>6
Yes, well, the economy is pretty unpredictable here in Vermillion City. Supply and demand fluctuates so quickly. Bigger is not always better. I can hardly keep track of it myself!

>>8
I'll have to remember this technique for later. Here it may be overkill, but I think there is another place in this schema design that could make great use of a playback mechanic!

>>7
Half of the problem stems from this abysmal decision.
Actually, I don't really sell Pokemon. This is all just a metaphor for the business I'm actually designing for. How sad.... Anyway, in this business, the hierarchy used to organize products doesn't actually have multiple types, unlike Pokemon. Regardless, your second suggestion sounds like it could work!

I do still need to track individual Pokemon purchases however, because a trainer might be interested in, say,  just a Charizard. So I could end up with something like this:


+--------------+
| licenses     |
+--------------+
| id           |
| client_id    |
| created      | < DATETIME
+--------------+

+----------------+
| license_items  |
+----------------+
| id             |
| license_id     |
| category_id    |
| subcategory_id |
| product_id     | < DEFAULT NULL
+----------------+

+----------------+
| products       |
+----------------+
| category_id    |
| subcategory_id |
| product_id     |
| created        | < DATETIME
| name           |
| base_price     |
+----------------+


...very similar to what you propose, which would allow me to conclude that if the product_id is NULL, then that row refers to a licensing of an entire Family (subcategory) -- from here, I can check all the Pokemon in that Family against the timestamp as you demonstrate.

Luckily, the design of this table introduces what I thought was a flaw, but is a actually a feature: for example, I could have the entire Fire Family in one license, but also have a second row in the same license for a single Charizard -- which would seem redundant at first, like something that the design of the table should not allow. But actually, there is the possibility that a customer would like to order some extra backup Charizards on top of their basic full Fire family order. Narrowly dodged a hyper beam right there!

This table's behavior is unruly like this because -- I think this is why, anyway -- I'm forcing the use of program logic to determine the meaning of things in the table, rather than having it be explicitly determined by table structure. (I am arbitrarily deciding that a NULL product_id means something significant, rather than it simply being missing data.) It seems like it will work in this case, and I think it's what I'm going to do -- but generally speaking, is this bad design?

Name: Anonymous 2010-06-14 21:01

>>9
In this scenario, why do you even need to explicitly acknowledge subcategory purchases? Why don't you just add every item in a subcategory to the license_items table when they purchase the entire subcategory and call it a day?

Name: Anonymous 2010-06-14 21:03

I wish my DBS instructor used such entertaining examples...

Name: Anonymous 2010-06-14 21:14

>>10
I could do that! But it brings me back to the concern from >>1, which is that this design would produce a lot of rows -- an average of about 18 per license in the license_items table.

Maybe it's not an issue? I would like to think that I'm conscientious about the amount of space I take up -- we've got an environment to look after and all that -- but I also don't want to be oversharpening an already sharpened pencil.

>>11
The schools have already turned to Java only. All hope for them is lost.

Name: Anonymous 2010-06-14 21:25

>>12
It's really a non-issue unless you are going to be running a major service with potentially billions of rows. It's actually the best option regardless of simplicity because of the flexibility it allows you in creating different packages. If you really think it's necessary, you could create a "packages" table where each package has a number of pokemon in it, and assign these as well.

Name: Anonymous 2010-06-14 21:28

>>9
for example, I could have the entire Fire Family in one license, but also have a second row in the same license for a single Charizard
That's not a good idea. What if the trainer would like more than one additional Pokémon (you'd need eight Eevees if you want to have at least one Pokémon of every species)? Would you add one combined Normal-type + Eevee license, and six separate Eevee licenses, or rather one Normal-type license and seven additional Eevees?
I think that adding singular licenses for every item would be a good option. Pokémon professors all over the world constantly find new or recategorize old pokémon, but you wouldn't then have to worry about any changes to the Pokédex messing up the existing licenses. You could perhaps then have a family_pack? field in the license record, so that you could group all the licenses bought at the same time.
Or maybe you should create separate tables for family-packs and singular licenses, because they are not equivalent.

Name: Anonymous 2010-06-15 0:35

You're essentially describing a users, groups, and permissions scenario (less fun, I know). Default is deny. In that case I'm not seeing why you need a separate license and license_items table. Just combine the two. It probably seems like your program is doing more than it should because the thing that you're modeling is rather abstract--that is, the right to use a certain Pokémon or group of Pokémon.

Unless of course, you need some kind of auditing. If you don't want the addition of a Pokémon to a group to be reflected in a user's licensed Pokémon, you're down to copying the whole list in.

Name: Anonymous 2010-06-15 0:44

First type only?

What about flying?

There is currently NO flying/(anything) type. Flying is a secondary type no matter what.

Name: Anonymous 2010-06-15 17:21

Stopped reading at ``cheers''.

Name: Anonymous 2011-02-03 2:04


Don't change these.
Name: Email:
Entire Thread Thread List