Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Databases IT

Making Sense of the NoSQL Standouts 152

snydeq writes "InfoWorld's Peter Wayner provides an overview of the more compelling NoSQL data stores on offer today in hopes of helping IT pros get started experimenting with these powerful tools. From Cassandra, to MongoDB, to Neo4J, each appears geared for a particular set of application types, providing DBAs with a wealth of opportunity for experimentation, and a measure of confusion in finding the right tool for their environment. 'There are great advantages to this Babelization if the needs of your project fit the abilities of one of the new databases. If they line up well, the performance boosts can be incredible because the project developers aren't striving to build one Dreadnought to solve every problem,' Wayner writes. 'The experimentation is also fun because the designers don't feel compelled to make sure their data store is a drop-in replacement that speaks SQL like a native.'"
This discussion has been archived. No new comments can be posted.

Making Sense of the NoSQL Standouts

Comments Filter:
  • by just_another_sean ( 919159 ) on Thursday July 21, 2011 @02:09PM (#36837356) Journal

    less ads.

    Print version [infoworld.com]

    • by drpimp ( 900837 )
      One word ...
      adblock
      or better
      lynx
  • not worth reading (Score:5, Informative)

    by rla3rd ( 596810 ) on Thursday July 21, 2011 @02:11PM (#36837384)
    Don' t bother reading this fluff. Wikipedia offers a better overview. http://en.wikipedia.org/wiki/NoSQL [wikipedia.org]. Oh I forgot, this is slashdot, no one here reads the articles :).
    • I just read it for the centerfolds.
  • In b4... (Score:3, Informative)

    by Anonymous Coward on Thursday July 21, 2011 @02:31PM (#36837554)
    This discussion is likely to lean towards "OMG NoSQL IS SO RETARDED!". So let me just say that if you don't care about NoSQL, then fine. If MySQL/Postgres/Oracle/MS-SQL fit your needs, then fine.

    That doesn't mean "NoSQL" databases are useless.

    I've had exposure to both MongoDB and CouchDB so far. CouchDB is the newest experience, as part of a Chef installation. Yes, it is a very immature product, and yes it has a long way to go, but it's very simple to configure and it does it's job with very few resources. I don't personally have a need for CouchDB myself, but I can see why people use it for certain specific needs (I.e. I can understand why Chef uses).

    MongoDB is a little marvel for certain applications. In my current and previous jobs we've used MongoDB for Syslog collection and SMTP mail logging. MongoDB is excellent for this sort of thing: each log entry is a single entry in the collection, the data is NOT relational in any interesting way and the insertion rate is far beyond anything a traditional relational database engine could manage on the same hardware at the same resource utilisation. Even better you can write some quite clever Map/Reduce functions on top that allow you to do some amazingly deep inspections of the log data, so you can produce on-demand data as well as graph out long term trends.

    NoSQL is a NOT a replacement for traditional SQL databases, but it sure is useful for stuff where SQL databases struggle.
    • by Anonymous Coward

      NoSQL may not be retarded, but the article is. To start with, they didn't cover two of the major offerings, H-Base and Project Voldemort. From everything I've read, Voldemort is one of the few that will actually scale really well, so ignoring it makes me really suspect of the research that went into the article and makes me think that they're just trying to capitalize on the NoSQL buzz word by writing an article with a brief summary of the first few options they found.

      • Can't talk about NoSQL databases without including PICK. Hell, it predates SQL by years. And scaling is what it does best.
      • Well technically you're not allowed to mention Project Voldemort by name, so they couldn't really cover it.
    • by mcmonkey ( 96054 )

      What's an example of where NoSQL is useful? I'm not a DBA or SQL guru, but I do work with traditional relational databases, and I'm having trouble thinking of a scenario where I'd want NoSQL.

      I did a little research and the example I found was Twitter, and it sounded like a mess. You have a list of feeds with their followers, and a list of followers with the feeds each follows. It sounds nice for finding who follows a feed or for finding which feeds someone is following.

      The issue I see is the duplication

      • And what happens when the 2 lists get out of sync? How much extra resources are spent making sure the feeds-to-followers list is consistent with the followers-to-feeds?

        Who would care about that?

        If you and I follow Lady Gaga and Paul Mc Cartney, and both of them publish a new tweet, you see Lady Gags new message before Pauls, and I see Pauls before Lady Gagas .... who the fuck cares?

        Big volume NoSQL DBs have one goal: they are eventually consistent.

        It does not matter if I and you see the exact same result a

        • by mcmonkey ( 96054 )

          If you and I follow Lady Gaga and Paul Mc Cartney, and both of them publish a new tweet, you see Lady Gags new message before Pauls, and I see Pauls before Lady Gagas .... who the fuck cares?

          If the list of feeds I follow includes Lady Gaga, but the list of Lady Gaga followers does not include me, then when I check my account, it looks like I should get Gaga's tweets. But when Gaga tweets, it won't get sent to me.

          Big volume NoSQL DBs have one goal: they are eventually consistent.

          Ah, I get it now. It's perfect for something like Twitter, where your users are your product and your only goal is to maximize your number of users. This allows Twitter to handle the maximum number of feeds and subscribers by ignoring quality.

          But for a service where the users are the


          • Of course, that's only if I don't think about how Twitter is becoming part of the emergency warning system. If the campus PD are sending out an alert because a Columbine or Virginia Tech type situation, I'd like to know sooner than "eventually".

            You know it soon enough. That is not the point.
            In this situation lets assume 1000 people give a warning. Lets assume the cluster has 100 nodes. And the "persistance rule" is: if ten nodes have it stored it is considered persistant(Quorum).
            Now 10 thousands or

      • by julesh ( 229690 )

        Any time you store a piece of information in 2 places, it's just a matter of time until the 2 don't agree.

        create table users (
        uid int not null primary key,
        username varchar(255) not null,
        passwordhash varchar(255) not null,
        unique (username)
        )

        When I insert into this table, a reference to the generated row is stored in two places: the primary key index and the unique username constraint index. Is it just a matter of time until the two don't agree?

        Why would this be different for a NoSQL system that stores information in two diff

      • Well, the very basic usecase is a key-value store. No relational overhead (and no sql parsing!) means it can be blitheringly fast.

        In general, if your data is highly structured and internally consistent, you'll be well off with relational databases. If you want very fast lookups, your best option used to be a hierarchical database (LDAP, for instance), but that's a bit of a bugger for updates. NoSQL can also fit that bill, but there's quite a few very different implementations that make it more or less suit

  • Read Nati Shalom's blog for an interesting article (http://natishalom.typepad.com/nati_shaloms_blog/2011/07/real-time-analytics-for-big-data-an-alternative-approach.html) about how to impliment an application using an In Memorg Data Grid as a front for the data and for real time or near real time analytics. The data can be persisted to a SQL or NoSQL database of your choice, depending on what best suits your application's needs.

  • by Anonymous Coward on Thursday July 21, 2011 @02:35PM (#36837594)

    Key-value store

    Key-value stores allow the application to store its data in a schema-less way. The data could be stored in a datatype of a programming language or an object. Because of this, there is no need for a fixed data model. This is generally of interest to friendless sperglords only.[16] The following types exist:

    Crowdsourcing at its finest.. Although, I suppose the comment is accurate?

    • It has to be accurate, there's even have a citation for it.

      Btw it's not there anymore (if it ever was).

      • Btw it's not there anymore (if it ever was).

        I can vouch for him, it most definitely was there ... I shoulda grabbed that screenshot I was gonna make

  • by mcrbids ( 148650 ) on Thursday July 21, 2011 @02:43PM (#36837652) Journal

    Sure, some solutions are faster than MySQL out of the box by skipping much of the language parsing and stuff that any SQL solution has to do. But that's not to say that they are actually more efficient at key retrieval.

    For example, one developer found that the best no-sql solution was.... MySQL, which excels at simple key retrieval [blogspot.com]. He was able to best MemCached by a factor of almost 2.

    Use the right tool for the job.

    • The issue with SQL is with joins particularly. MySQl is not a noSQL solution to this problem. If you do not use them and just need a single database you will be fine with traditional SQL. NoSQL wont be a benefit. If you host a simple website you wont run into that scalability problem.

      Now imagine your a systam analyst who needs joins to do things, such as comparing a pricing database with a sales order database to see if a discount worked and by how much? This is where you need join. Now imagine the size of

    • Given how bad some versions of mysql support sql itself you can qualify them as nosql as well :-)

    • I read his article.

      DON'T DO WHAT HE DID!

      Although his conclusion is sane, the way he went about to make it happen is overly complex.

      In his specific scenario (running always the same SQL queries by primary key but with different parameters) he found out that CPU time spent in SQL parsing and Query cost estimation were resulting in CPU-bound throughtput for MySQL.

      He then proceed to "fix" this by getting some library that allows direct access to MySQL's underlying database bypassing the SQL layer and rewritting

  • by Anonymous Coward

    first off, you have to really really understand your dataset before committing to either an sql or no-sql solution. this is because the main theoretical difference, as i see it, in sql, one basically generates a result set and the "game" is to find a particular record (or records) within result set, whereas with nosql, you basically already have your "object" (or key) and the "game" is to find what the object connects to. a subtle yet extremely important difference.

    im towards the end of an 8 month project t

  • by Manip ( 656104 ) on Thursday July 21, 2011 @03:05PM (#36837902)
    We want to jump on the NoSQL ship. I won't bore you with all of the details but briefly put SQL databases and tables are too restrictive for our work. Unfortunately because there are SO many NoSQL solutions, and none of them are backed by big names nobody here has the balls to sign off on one. Unfortunately, and ironically, NoSQL's biggest downside is the lack of cross compatibility. Once you make that call you're stuck with it good or bad.

    The other issue, is that because all of these solutions are relatively young the toolsets simply don't exist for many of them. No libraries, backup solutions, third party support, etc. I wish we'd see someone like Microsoft, Oracle, IBM, or any big name roll out some kind of complete solution (in particular XML compatible). I know a few big Cloud solutions exist but again we come back to being locked into a solution.
    • by bhcompy ( 1877290 ) on Thursday July 21, 2011 @03:26PM (#36838206)
      Not every solution is young. PICK is a NoSQL db that predates SQL. It's descendants are supported and cross-compatible to a degree. NoSQL is a generic term. You need a specific database. For a PICK based solution, I'd look at Reality. Reality has been around for decades and is highly supported and has many features for compatibility with modern databases and modern operating systems. OpenQM is GPL licensed and of the same class. jBASE might be a more recognizable descendent.
    • by PCM2 ( 4486 )

      I won't bore you with all of the details but briefly put SQL databases and tables are too restrictive for our work.

      Care to make a case for that?

      Perhaps your work is too chaotic and disorganized for SQL tables?

      (BTW, C. J. Date would take issue with anyone who thinks "tables" are part of the relational model, but I digress...)

      • I doubt it is easy to make a case for that. As you coin it. However everyone I met last 10 years who is using NoSQL DBs made a gut decision. When I met the shop they always could show me a few things where I agreed that it is not really possible to do with a traditional SQL DB.

        E.g. when you have to write giga bytes per second to the DB you are out of luck with any of our days RDBSs.

        Keep in mind, NoSQL DBs are usually optimized for write performance and for the "exact retrieval path". There is no join involv

        • by PCM2 ( 4486 )

          NoSQL is write once, never update but read often. SQL is read, write update all the time.

          And yet most MySQL installations (Web apps, anyway) are: read all the time; write some; update seldom. That's why MySQL became a popular database for Web apps -- it was faster for that model than Oracle (on the same hardware). SQL or the relational model wasn't the problem. The implementation was the problem.

          I'm sure there are some cases where NoSQL is absolutely game-changing -- but those cases seem rare, and where they have occurred, the companies that really need NoSQL seem to be the ones who invented it

          • Well, you basically got the point what NoSQL is all about.

            As I mentioned in a different post, "NoSQL" does not necessarily mean "no" SQL but mostly it is referred to "not only" SQL and means you mix your storage strategies.

            Imagine facebook, 100 millon users concurrently online. 1 million of them is writing a 100 characters comment on "something" per hour. That is 100 MB data to store per hour. And no one cares if he reads it just in time, 1 min after posting or 10 mins after posting. In other words: everyth

    • It really depends on what kind of data you have. In the 90s when OODBs where the next big thing I was in a project where they tried to shoehorn tabular data and operations into an OODB, the project failed utterly, thanks to non existing well working query languages etc... entire simply sql ops became a major pain.
      Schema updates forget about them every second one broke the existing db and data etc...
      I assume with nosql the situation is rather similar, blazingly fast for certain use cases but utterly unusable

    • by tcr ( 39109 )

      We want to jump on the NoSQL ship

      These comparisons [kkovacs.eu] might be of interest...

  • by roman_mir ( 125474 ) on Thursday July 21, 2011 @03:07PM (#36837954) Homepage Journal

    First you need to learn something useful, like understand a normal database, like PostgreSQL, SQLLite, DB2 or whatever your heart desires (not MySQL, that's just not right.) Once you really understand the normal databases and you understand your requirements only then you can make a statement by going 'nosql' something, otherwise it's most likely for most scenarios is counterproductive, you are not all FBs out there.

    • Oh, and before you get on my case, I know that FB uses MySQL. The point is you are not all in need of huge quick data caches, and if you are serving static pages from a dynamic source, you are doing something else wrong altogether.

    • I don't know.

      This guy [youtube.com] made a compeling case to use MongoDB over MySQL.

    • First you need to learn something useful, like understand a normal database,
      First you need to learn something challenging, like implementing your own data base.

      like PostgreSQL, SQLLite, DB2 or whatever your heart desires (not MySQL, that's just not right.)
      like VMS on VAX with its build in DB, or Mumps or PICK (not SQL, that is not right).

      Once you really understand the normal databases and you understand your requirements only then you can make a statement by going 'nosql' something,

      Once you really underst

      • I am not a database programmer nor a DBA.

        However I have worked with database software that needed to
        1. Compare several tables in different databases
        2. Do relational logic to analyize relationships, hence relational RDBMS.

        A typical business task at work would be to figure out if a discount worked and by how much with certain stores in only a certain section of the country. Gee, I would need a SQL join (I hear the booing of the noSQL evanglists on that) to look at the orders database, the pricing database, as

        • NoSQL DB does not imply you can not join ....

          Until someone can tell me that a noSQL database can do these things it is all hogwash.

          That is your misconception. There are countless more reasons for NoSQL than for SQL. Every situation where you can calculate your key, and that means in an extended way "can calculate the exakt disk address" of the data to retrieve, NoSQL is several magnitudes faster.

          Typical NoSQL is not to REPLACE your SQL/RDBMS solutions, it is to ACCOMPLISH them. However with our days hardwar

      • Claiming that only SQL (and RDBMSs) is right is like claiming only Windows is the right OS. It simply shows you never saw any other OS and have no clue at all.

        - while my advice is actually something that's useful to people who may otherwise be going in the wrong direction, yours is just stupid and pretentious and doesn't even apply to me, since I did enough work for AT&T, Bell Canada, Symcor, IFDS to have worked with some things, you may not even recognize as databases.

        Yes, for the majority of people RDBMS is correct, both from their business perspective and the skill sets necessary.

        • Yes, for the majority of people RDBMS is correct, both from their business perspective and the skill sets necessary.

          I doubt that.
          Either there are no DBAs there used to be 20 - 30 years ago or business demand increased far far far more than DBs could follow.
          Last 20 years I never saw any DB that could meet demand of the business.
          That includes a "cluster" of 4 M4000 servers and lots of attached terabytes of storage. In this case based on Sybase, not Oracle.
          The majority of people is just storing records into on

  • I was a notes programmer a decade ago... (wow...) I went to a talk on CouchDB and It all seemed strangely familiar.

    Basically lotus Notes is a NoSQL database with an email and calendar program attached. Of course anything was better than "lotus script" but I can see why this stuff is very appealing. I think some of the couchDB developers are former notes developers are involved in the NoSQL movement.

    • by kiatoa ( 66945 )

      Whilst a captive user of Lotus Notes at IBM I frequently grumbled about it. In retrospect I really didn't appreciate how good it was and how much easier it made my life. I regularly synced my mail to Linux and to Windows and was able to seamlessly work offline. If it was an easy install on Linux I'd seriously consider dropping the $100 or so for a copy and I don't own *any* commercial software.

      The "slosh data around model" has a strong appeal and Notes seemed to mostly do it pretty well. In a similar way th

  • by EmperorOfCanada ( 1332175 ) on Thursday July 21, 2011 @04:12PM (#36838856)
    One the many reasons that programmers that I know are adopting these technologies is that it breaks the back of the in-house DBA. Often there are a few in-house DBAs with certifications up the wazoo who squeeze themselves into every project that has to store data(all projects). But somehow their word becomes the final word. Getting a table added to a schema can take days or even weeks and might not be approved at all. Suddenly with MongoDB or whatever the DBA has no possible input. One can make all kinds of arguments for and against relational systems and how valuable a DBA is to the long term health of a datastore but from many developer's / project manager's perspective a modern DBA often acts as a brick wall to on time on budget.
    • but what about that third pillar? the quality thing?

    • Yeah and when data is lost and the middleware app crashes then who is at blame? I doubt the DBA as he/she did not implement it. The manager would have some explaining to do to IT on why he thought he could circumvent the DBA and corporate policies.

    • by Tenareth ( 17013 )

      One of the main reasons for this is that the DBAs are the ones that keep the production environment functioning. Devs get to put in whatever random thought that crosses their mind and when it breaks in production and data is lost, or clients are impacted they just shrug and say "Odd, didn't expect that".

      A 'modern' DBA should be trained in whatever development cycle that dev is using, which may include Scrum/Agile, in which case the process would be integrated and the delay of implementation would be greatl

  • If you are curious about the benefits of using MongoDB there is a good explanation here [xtranormal.com].
  • The article didn't cover Amazon SimpleDB (http://aws.amazon.com/simpledb/ [amazon.com]). SimpleDB is part of Amazon AWS, so it's cloud-only. However, if you're planning to deploy on AWS anyway, it makes for a formidable option.
    • by julesh ( 229690 )

      Only a total nut would intentionally choose a solution that ties you to a single hosting provider who have acquired a reputation for kicking off clients they don't like.

  • Comment removed based on user account deletion
  • We use Cassandra for all the user management and virtual file system storage at ClubCompy, It is so blazing fast compared to SQL for both read and writes, and it is very scalable. I've had a node of my storage cluster go down and whole system stays up with no data loss, and it can repair itself once I bring the downed node back up.

    Coding to Cassandra is pretty challenging, you have to do all of your data modeling in code or use the new CQL to access the cluster. I wrote about my experiences recently, whe

Statistics are no substitute for judgement. -- Henry Clay

Working...