Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Databases Data Storage Open Source Upgrades IT Apache

Cassandra 0.7 Can Pack 2 Billion Columns Into a Row 235

angry tapir writes "The cadre of volunteer developers behind the Cassandra distributed database have released the latest version of their open source software, able to hold up to 2 billion columns per row. The newly installed Large Row Support feature of Cassandra version 0.7 allows the database to hold up to 2 billion columns per row. Previous versions had no set upper limit, though the maximum amount of material that could be held in a single row was approximately 2GB. This upper limit has been eliminated."
This discussion has been archived. No new comments can be posted.

Cassandra 0.7 Can Pack 2 Billion Columns Into a Row

Comments Filter:
  • by oldhack ( 1037484 ) on Sunday January 16, 2011 @09:01PM (#34900840)
    What sorta applications need so many columns? Curious.
  • by gratuitous_arp ( 1650741 ) on Sunday January 16, 2011 @09:20PM (#34900970)

    Apparently the extra columns can be used to the effect of doing "more" than store data. A link in the article explains how lots of extra columns can be useful for querying data (Casandra doesn't use SQL). http://maxgrinev.com/2010/07/12/do-you-really-need-sql-to-do-it-all-in-cassandra/ [maxgrinev.com]

    So the primary reason for this doesn't seem to be that one's run-of-the-mill database needs more columns.

  • by SQL Error ( 16383 ) on Sunday January 16, 2011 @09:31PM (#34901026)

    The main reason was that Cassandra prior to 0.7 didn't support secondary indexes. Your keys in a table ("columnfamily" in Cassandra-speak) were indexed, and the names of the columns in a row were indexed. And Cassandra is schemaless, so the columns in one row could be completely different to the columns in another.

    So you'd use columns as sub-records to get the data structures you need.

    With 0.7 and secondary indexes, that's going to be less important.

  • by dirkdodgers ( 1642627 ) on Monday January 17, 2011 @12:14AM (#34901752)

    So I can appreciate that this announcement sounds like News for Nerds, but can someone why it Matters that Cassandra can support 2 billion columns?

    The article basically says "because you can't execute SQL you need lots of columns". OK, great, why would I want that? The article doesn't tell me. The Cassandra website sure doesn't tell me.

    Oracle 11 supports up to 8 fucking EXABYTES of data in an RDBMS that I can execute SQL against. What Cassandra puts in columns, I put in rows.

    I've scoured this thread like all the other ones on Cassandra for the killer feature, for the "you can do this with Cassandra that you can't do as well with an RDBMS" and I can't find it.

    The best I can come up with is "I want to store lots of indexed data, I don't care about transactional integrity, and I don't want to pay Oracle". Is that it? That's fine if it's it, Oracle doesn't come cheap and that can be a deal breaker for new companies, but I just wish someone would spell out that this is the justification for Cassandra's existence.

  • by DavidTC ( 10147 ) <slas45dxsvadiv.v ... m ['box' in gap]> on Monday January 17, 2011 @12:31AM (#34901818) Homepage

    Wow, it's almost like you've invented databases, but rotated 90 degrees so that every single existing programming paradigm fails and you have to invent new ones to loop through columns.

    Instead of what every other database does, load the rows you want, and just those rows. With nicely named headers that get used to label the parts of each row. Oh, and types that vary per column.

    And indexes on columns...wait, let me guess, you can now index rows...although that can't actually work, programmaticly, because the columns aren't stored next to each other, so locating a value in a specific row can't tell how to retrieve that entire column..WAIT!

    Did this just exchange the meaning of rows and columns in some sort of mindfuck, but left everything the same?

    This is making more and more sense for Bizarro, but not really for anyone else.

  • NoSQL stuff is useful in weird extreme fringe cases, where you need to access data in essentially random ways. Digg, Facebook, and Google all NoSQL databases, and I think the first two use Cassandra.

    Specifically, you kinda make your own rows. It's like having permanent multiple JOINs that you can access instantly, from what I understand. (This is what this article is talking about, it's now unlimited.)

    Essentially, it's a giant blob of data that exists, and you draw lines on it in advance that are your results, and you can get those result instantly, at the cost of being unable to decide to get other results in real time.

    Many of the products let you have them on different servers, so you can have a 'people who have voted for this Digg' table or something, on the server that handles that thing.

    I'm not entirely sure how it works, but that's basically it. Oh, and the fact they talk about 'columns' and 'rows' is just utter stupidity in naming to confuse everyone. Basically, they simply tend to keep each column as a file, which allows them to do what I mentioned above..copy needed columns, and just needed columns, to other servers.

    It's really weird, and, like I said, only relevant for giant giant databases. There's no way that google could do a full text search on a RDBMS, regardless if it fits in Oracle. What it can do is make a 'column' for each word, and a 'row' for each URL, put different columns on different servers, and that actually works in the non-relational database they use, when there's no way in hell that would work on a RDBMS.

    However, more importantly for slashdot, a fuckload of fools think that SQL is somehow 'retarded' and that NoSQL is 'awesome, dude', so they like to play with it, usually by spewing out some crap PHP or Perl or something that works about a tenth as well as just using an RDBMS would work. If they actually understood how to use an RDBMS, that is.

  • by jellomizer ( 103300 ) on Monday January 17, 2011 @08:08AM (#34903298)

    I don't think it is a good idea to propose limitation just to stop bad coding practices.

    For 1 the limitations rairly incourage good ones they only make them worse. Eg 254 columns with the 255th pointing to the tablename2 with more data.

    Second by preventing people from doing something stupid they also prevent them from doing something ingenious.

    Third there may be a good reason to do this as well.

    Fourth you make it big enough so you won't need to make it bigger

Anyone can make an omelet with eggs. The trick is to make one with none.

Working...