Database design and a question to users

17Jun08

Things have been slowly moving forward. I’ve been away for 5 days, visiting friends and family. Got sick when I came back, so havent done much since then either. Hopefully I will feel better tomorrow.

Anyways me and Robert Kaye have been working on updating the data model I created for my project proposal. One question arose… Should one be able to separately specify artists to watch for new releases of and artists to view missing releases of? Of course it is nice to be able to specify more, but it would only be annoying if one always want to both “watch” and display missing releases of same artists. What do YOU think? Please leave a comment! And also… Do you want to be able to specify release types(official, bootleg, EP, single etc…) to ignore separately for “watching” and viewing missing releases? Or do you prefer same setting for both?


(Click the image to view it in its original size)

This data model facilitates the “must have” features listed for my project(functionality.rtf) and some of the other functionality as well. More attributes and possibly tables can be added later if time allows me to implement more functionality.

I will put together a more extensive description of the data model for the wiki later when it has been more finalized but until then… First of all, collection_info is the most interesting table. I think you can figure out most attributes. ignorecollectionattributes and ignorewatchattributes are lists of the release types to ignore, the ones to ignore when “watching” for releases and those to ignore when displaying missing releases in the user’s collection.

Oh yes and by the way, “join tables” are blue, the other music collection related tables are green and the already existing tables that the music collection make use of are in the orange rectangle.

collection_watch_artist_join is a join table used for knowing which artists the user want to watch for new releases of, while collection_discography_artist_join knows about which artists the user want to see missing releases of.

collection_ignore_release_join store releases that the user want to ignore “missing status” of – so that the website wont display those releases as missing even though they might be. And collection_has_release_join knows about which releases the user have in his/her collection.

As usual, please give positive/negative feedback! The latter is even more appreciated, though I also appreciate positive feedback 🙂

Advertisements


14 Responses to “Database design and a question to users”

  1. 1 ijabz

    Hey Niklas.

    I dont fully understand this so sorry if Im barking up the wrong tree with some of these comments.

    1. What would help me and other potential reviewers would be if so did a diagram of the database instance, i.e showing the tables with a few rows of data in them makes it easier to understand.

    2. What defines a ‘new’ release, I assume its a release that has been added to Musicbrainz since a particular time , would it be shown if the release was only added to Musicbrainz yesterday , but the release date was 5 years ago ? I dont think there is a need for a ‘Watch Artist’ and ‘Discography Artist’ Table because any releases added to Musicbrainz will automatically become Missing releases as you wont have been able to identify them in your collection beforehand. If a user really wanted to say only show ‘New releases for Artist B’ this could be implemented on the Website/Desktop Tool by just filtering the results to hide releases based on the release date dynamically, this would give the user better flexibility. If you feel that you still need these tables I would have a single table with multiple attributes as follows:

    Watch Artist
    —————-
    Id INTEGER (PK)
    Collection Info INTEGER (FK)
    Artist Id INTEGER (FK)
    Show New Releases (BOOLEAN)
    Show Missing Releases (BOOLEAN)

    This is easier to understand and saves table space, you might only need one boolean if you assume the user always want one of the options.

    3. Primary Keys in Join tables
    I wouldn’t bother with primary keys in the Join tables, because nothing is going to use and an index based on this primary key.

    4. In Collection table in the same vein I cant see why you would want to ignore different release types for New releases then for Missing releases. I think the name would be better if it was something like ignoreReleaseTypes.

    5.. but I think user would like to ignore different releases statuses as well (Bootleg,Promotion) so I would add this additional attribute to the table.

    6.. and although the NGS isnt there to help are you going to only display one version of a release or all of them, personally I dont want see 30 different versions of the same album.

    7… and they may only want to see albums released in a particular country ( this is one way of going towards achieving 6)

    8. In Your Functionality.rtf doc you talk about certain things at the Song/Track level , but the db schema only goes down to the Release level, has the track stuff been dropped ?

    9. I dont understand how the time ranges tables are used
    cheers Paul

  2. 2 niklasbe

    Hi
    Thanks for the comment loaded with lots of feedback!

    “1. What would help me and other potential reviewers would be if so did a diagram of the database instance, i.e showing the tables with a few rows of data in them makes it easier to understand.”

    Aha. I have been thinking about making some additional diagram to go with this, but I would rather not make one of an instance with example data. I think it would be still hard to understand as theres mostly keys in there, and a few attributes. I’ll explain the attributes instead. Perhaps my brief explanation was too brief 🙂 Ive been thinking about some additional diagram to help explain it. But Im uncertain whether it is necessary? It is not a big diagram and I think it will be easy to understand when there is a good description to go with it. If another diagram would be desirable – perhaps something like http://img66.imageshack.us/img66/33/datamodel2ignorerangeac5.png would be easier to read? Note that this is an old outdated diagram. This diagram is also a data model, but without the join tables.

    Description of some attributes in collection_info:
    lastcheck – there will be a Perl script ran daily which will send e-mails with notifications. it needs to know when it last checked for new releases, thats what this attribute is for.
    publiccollection – can be viewed by the public or not?
    notificationinterval – specify how many days in advance the user want to be notified about upcoming releases

    The collection_ignore_time_range table is there to keep track of time ranges to ignore. So that for example one can select to ignore releases prior to 1980. This is per collection…

    The join table between collection_ignore_time_range and collection_info was a mistake… I’m updating the link in the post right away to an updated diagram.

    “2. What defines a ‘new’ release, I assume its a release that has been added to Musicbrainz since a particular time , would it be shown if the release was only added to Musicbrainz yesterday , but the release date was 5 years ago ?”

    Im looking at the release date, not the date when the release was added to the database.

    “I dont think there is a need for a ‘Watch Artist’ and ‘Discography Artist’ Table because …”

    Not sure I understand what you mean. Those tables are used for keeping track of which artists the user want to “watch” and view missing releases of. Has to be stored somewhere.

    Those two tables might be merged into one depending on the user input on my question. Using a boolean attribute like that would be one idea.. But Im not sure that would be a good design? As I see it that would add complexity and make it more error prone? Not sure though… Plus in the future one might want to add attributes to those join tables. E.g. if the ignore time range functionality would be both per artist and specified for either collection or watch – it should have a relation to one of those tables.

    “3. Primary Keys in Join tables
    I wouldn’t bother with primary keys in the Join tables, because nothing is going to use and an index based on this primary key.”

    Its as far as I can see according to the conventions in the current musicbrainz database schema.

    “4. In Collection table in the same vein I cant see why you would want to ignore different release types for New releases then for Missing releases. I think the name would be better if it was something like ignoreReleaseTypes.”

    Hmm perhaps one wouldnt want that. I dont know. Again depends on feedback from users. Will edit the post and ask about this as well.

    Do you mean you prefer types instead of attributes? It is attributes because thats what it is called in the database, but I believe its type or something similar in the web interface.

    “5.. but I think user would like to ignore different releases statuses as well (Bootleg,Promotion) so I would add this additional attribute to the table.”

    This is included in the ignorecollectionattributes and ignorewatchattributes lists.

    “6.. and although the NGS isnt there to help are you going to only display one version of a release or all of them, personally I dont want see 30 different versions of the same album.”

    The earliest release will be shown.

    “7… and they may only want to see albums released in a particular country ( this is one way of going towards achieving 6)”

    Okay, added this as to the list of functionality that might be implemented.

    “8. In Your Functionality.rtf doc you talk about certain things at the Song/Track level , but the db schema only goes down to the Release level, has the track stuff been dropped ?”

    Yes, track stuff has been dropped. Functionality.rtf has also been updated to reflect this. I should have mentioned it in the blog post. Will update!

    The answer for your 9th question is in my reply to the 1st question. Thank you for the feedback!

  3. 3 CatCat

    ok, first forgive this non-coder his ignorance at all this.

    but my two kroner about this would be that I would like to have separate settings on artists in my “list” (or separate lists where artists could be both on list a and list b but also for artists to be on either or)
    for artists where i want to fill out “holes” in my collection but i don’t care about “new” releases (which would consist on either new collection albums ad nausium, or new material without the same style) this is true for alot of the music *I* atleast listen to.
    other times i would like to do both. an artist still active and fairly new. which needs filling out “in both ends” so to speak

    also it might be possible that an artist’s new releases are what is interesting and the older “holey” discography is of no interest

    sorry if I understood stuff wrong. keep up the good work 😀

  4. 4 ijabz

    1. “Aha. I have been thinking about making some additional diagram to go with this, but I would rather not make one of an instance with example data. I think it would be still hard to understand as theres mostly keys in there, and a few attributes”

    The reason I like instance diagrams is that it shows others what you are actually going to do , whereas descriptions about what you are going to do can be misunderstood (but of course its more work for you). For example and instance diagram would have meant I would have realised that release types and releases status are all held as attributes, mind you I could have checked a bit more thoroughly myself before posting 😉
    2. When I said “I dont think there is a need for a ‘Watch Artist’ and ‘Discography Artist’ Table because …” I meant I dont think we need both – not that we dont need either of them. My main point was I dont see the need to seperate out Missing Releases for an artist and new Releases for Artist but CatCat has since come up with valid use cases.

    “Plus in the future one might want to add attributes to those join tables. E.g. if the ignore time range functionality would be both per artist and specified for either collection or watch – it should have a relation to one of those tables.”

    I dont see that it makes sense to specify a date range on which to ignore ‘New releases’ because new releases are always going to be approximately now arent they? But your comments reminds me I do think you should specify the time range at an artist level from the start rather than collection level. I might like to say ‘Ignore Releases by David Bowie’ after 1980 but I cant see that I would just want to say ‘Ignore all releases by everyone ‘ after 1980

    4. “Do you mean you prefer types instead of attributes”
    I was speaking from the webservice side, yes you are right they are called attributes in the db ..
    5. …. and these are also included in attributes – I didnt realise because they are seperate at the webservice level.

    6.. “The earliest release will be shown.” – But some people might want to see all releases, perhaps this should be an option.

    8.. “Yes, track stuff has been dropped. Functionality.rtf has also been updated to reflect this. ”

    I was using the link at the top of the post it still says
    – Exclusion – Exclude songs

    10. What decides whether you have a release, do you have to have all the tracks or just some of the tracks ?

    Cheers Paul

  5. 5 niklasbe

    “The reason I like instance diagrams is that it shows others what you are actually going to do , whereas descriptions about what you are going to do can be misunderstood (but of course its more work for you)”

    Could be one way to describe it. Im sceptical though because there’s mostly keys and foreign keys.

    Suggestions on how to describe it better is very welcome though! thanks.

    “I dont see that it makes sense to specify a date range on which to ignore ‘New releases’ because new releases are always going to be approximately now arent they?”

    I was not thinking that through 🙂 That would make no sense at all.

    “But your comments reminds me I do think you should specify the time range at an artist level from the start rather than collection level.”

    Yes, ignoring time ranges at artist level would be desirable. We decided to keep it simple to start with.

    “6.. “The earliest release will be shown.” – But some people might want to see all releases, perhaps this should be an option.”

    Perhap. I have added this option to the list.

    “8.. “Yes, track stuff has been dropped. Functionality.rtf has also been updated to reflect this. ”

    I was using the link at the top of the post it still says
    – Exclusion – Exclude songs”

    Oops! Updated.

    “10. What decides whether you have a release, do you have to have all the tracks or just some of the tracks ?”

    Just one for now.

    Thank you for your feedback!

  6. 6 ijabz

    “Yes, ignoring time ranges at artist level would be desirable. We decided to keep it simple to start with.”

    I just wonder if time ranges at collection level are much use, if it was me I would create the database to support time ranges at artist level but in order to keep it simple I wouldn’t actually implement this feature in the first release (because Modifying database schemas is always a bit of a pain.)

  7. 7 niklasbe

    I’ll take that into consideration, but leave the schema as it is. Exclusion of time ranges is not a “must have” functionality, I’ll reconsider it when it is (possibly)time to implement it.

  8. 8 djce

    At a glance, I’m having trouble understanding how the features will interact with each other, and therefore whether or not the data model is appropriate. I think it would help to change the rough outlined bullet-point feature list into a proper spec, which clearly states how the logic would work.

    For example:

    “Each moderator can create as many (or as few) collections as they like; each collection is owned by the moderator that created it, and can only be modified or deleted by that moderator. Each collection acts independently.

    A collection consists of a set of watched artists, a set of discography artists, and a a set of ignored time ranges. A release will be recommended for a collection if its artist is in the watched artists set and/or the discography artists set, but not if the first release date of the release is within any of the ignored time ranges.”

    That sort of thing. Does that make sense?

  9. 9 niklasbe

    Hmm like the data model but as written text? Does my first blog post(https://niklasbe.wordpress.com/2008/04/23/introduction-to-my-project/) help? Not a “proper spec”, but still.

    A user(moderator) has zero to one collections. Should probably be one though.

    Anything specific that youre wondering about?

  10. 10 djce

    Doesn’t help much, to be honest. If I had to guess how the schema is used to implement the idea as laid out in that earlier post, then (a) I’d have no idea what the two “ignore” tables are for, and (b) I wouldn’t know which of collection_watch_artist_join / collection_discography_artist_join is used to record which artists the user is interested in (so, what’s the other table for?).

    Anyway, clearly your idea has extended beyond what’s in your original blog post, so I think it’s worth static clearly what your intended functionality is to be. Such a document could either be, or server as the basis of, documentation for the feature; could serve as a guide to creating the code to implement the feaure; and it could help explain your idea to other people (including me 🙂 so that they can understand your idea and offer feedback.

  11. 11 niklasbe

    Alright, I see. I will put something together on the MusicBrainz wiki to describe things in a bit more detail. Will describe the project idea, and the data model for now.

    It will be announced in a new blog post 🙂 Soon I hope.

  12. 12 niklasbe

    Follow the link in my latest post to see the wiki page which has some more info. Please ask for more info if there’s something missing.

  13. hey dude
    Great stuff! i got a lot of inspiration from this post
    it is very interesting ….
    i went through this page two times
    am learning for social work

    Thanks


  1. 1 MusicBrainz Blog » Blog Archive » Discographies database schema review

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: