Video Working with Real-Time Analytics as a Service (Video) 15
Dev Patel:I am Dev Patel.
Poulomi Damany:I am Poulomi Damany.
Robin Miller: And these folks work for BitYota which does analytics, specifically on MongoDB. Why MongoDB if I may ask?
Poulomi Damany:Okay. I think what we’re seeing is as today’s fast changing mobile and web apps need the flexibility of a database like MongoDB. So you can continue to add features and save that directly without having to go through schema changes, without having to do a new data model. So Mongo is being adopted by people who are interested in doing fast development, new apps, and so we want to help those people actually do analytics in that data in Mongo. So in keeping with the theme of Internet insights need to be in real-time, changes need to be in real-time, analytics need to be in real-time as well. And that’s why sort of BitYota’s one area of focus is in our MongoDB analytics.
Robin Miller:So, you did not choose to go with MySQL owned by Oracle, why not?
Dev Patel:Hey, Robin, we’re looking to solve problem of analytics of semi-structure data.
Robin Miller:Yes.
Dev Patel:Semi-structure data is the new flexible data type as Poulomi explained, application developers can change it, as the application evolves, as applications going to be maybe testing, they could add new features, and as soon as those new features are added or removed, they want to understand the impact on user experience or [comas] based on those new features. All this is happening fast in the semi-structured world, that’s where operational database is like Mongo are doing extremely well. And as Poulomi was explaining, we want to be able to provide analytics as close to the time when the data is being produced rather than have to wait till the end of day or end of week, and these kinds of new requirements from the industry where you want to provide analytics at low latency over fresh data, is the kind of problems the industry needs to be thinking of solving, and one way of doing that is to start solving them over the next-generation SQL databases and provide analytics capable of using those.
Poulomi Damany:And really MySQL is atransactional system, so you might be a start up and you might throw your payment systems stuff into MySQL and then your website traffic, and your user profiling and your product catalog in Mongo. And frequently what you want is you’d want to join between those two, so you want to say, all my best customers and how much money do they spend, right. So you need a system that allows you to bring multiple structures of data together.
Robin Miller:I keep hearing is this emphasis, not just from you guys on immediate analytics rather than waiting for the end of the day, other than you’re broadcasting live the Olympics or whatever and aside from a live broadcast who needs really their analytics that quickly?
Dev Patel:Let me give you a couple of examples, Robin.
Robin Miller:Yes.
Dev Patel:If you’re doing a marketing campaign and you’re doing promotions, you’ve got end of season promotion and end of product line promotion, you’ve got tickets at – I’ll tease you a little, cricket match between England and Australia, or a baseball
Robin Miller:But cricket takes seven days. I’m sorry, I am American and we think baseball is slow.
Poulomi Damany:Well you have baseball tickets to the Giants.
Dev Patel:We have a shorter version too to please the American audience, it finishes in 45 minutes. But you’ve got inventory that will expire, whether it’s tickets, whether it’s product lines, whatever that maybe, you’re doing a promotion, you want to know whether a promotion to particular audience is working, you don’t want to wait for the end of day to understand whether it worked or not, you want to be able to answer analytical questions like, hey, if it worked for this audience where should we be doing it next, where should we push by geo or by particular audience segments, by particular channel, hey I’m doing very well, if I’m advertising through Twitter versus Facebook. All of that you need to know quickly in many scenarios and therefore you are continuously hearing the need that we want analytics sooner than end of day. And therefore the popularity where the business needs are being answered in a much shorter timeframe than they were previously done before.
Robin Miller:So it’s good that our IT and programmer audience on Slashdot is aware of this because they tend to be frankly not very marketing oriented, I’m not either, but the bosses do come down and say do this and what you’re saying is, these people, the IT people, remember this, they won’t be going, ???, but they instead come across as educated to their bosses and they will be ready to spring into action, and setup real-time are close to a real-time analytics correct?
Poulomi Damany:Yeah, agreed and there’s not just revenue reasons, you could have a new version of an application and it’s crashing on certain browsers, you want to wait 48 hours to understand that, because you’d have varied desktop users, right, or you have a fraud situation and have spam going on, all of these are the need for analytics is moving upstream much, much quicker.
Robin Miller:Those are very good examples, and ones that I – like I said I think our IT – they’re ones that our IT and programmer audience will definitely get their minds around, what else for real-time analytics, what else?
Dev Patel:Ability to join data from different sources is important, often the example that I quoted earlier, your web, click or viewstream or clickstream data from your mobile app can come in a JSON document through our MongoDB, but your transactional systems are still MySQL or Oracle, how do you join data between those two streams very quickly? One stream is in JSON file format, the other is in a structured file format like CSV, how do you join that? Traditional systems can integrate with JSON type, so you have to convert them into a structured form and then do the joints, so you’re going through some translation of the JSON file to a structured CSV. If your upstream application is changing the JSON, then your ETL needs to continuously evolve which is this translation piece, it needs to continuously evolve. How can we have systems where you’re integrating with the native file format, JSON and CSV and you are storing it in its native form without the need for translation, and now you’re able to join across both these file formats to answer questions like who are my big spenders in the last five minutes, who are my big spenders in the last hour. How did my big spender influence somebody else in a social stream whereby I got another three new customers in the last hour?
Robin Miller:That’s a good one.
Dev Patel:As there is proliferation of information transfer through social networks, through people communicating the word of mouth through social, through mobile applications, how do you bring that to understand things like user value, how do you bring that to understand, hey, who’s influencing my staff, as an example. Those are the kind of things, examples where data from different sources needs to be joined and understood and that data is coming in different file formats.
Robin Miller:Well, I mean, you say something that’s exactly opposite, we’re being happy and positive and optimistic here, but it sounds to me like one of the big uses of real time analytics is to spot negatives, as somebody just said, your product causes cancer or whatever. It could be anything, but you know what I mean, as something is ammunition to fight against being bad mouthed in the social media.
Dev Patel:Absolutely. I mean look, it’s a gruesome examplef your product causes cancer. But, your point is critical in that we have to catch negative messages or marketing people want to catch negative messages as much as catch positive sentiments, it’s the proliferation of negatives and correctly is what they want to address very quickly because news travels fast these days.
Robin Miller:And it does and I mean, this is a real reason why people need to really just track everything anybody is saying about them even if 99.9% is good, that last point one can kill sales even if it’s false, a classic American example was the Tylenol scare. And branded Tylenol is perfectly safe, it had nothing and yet the company had to pull everything that was on the shelves and they freaked out and they did it so well that people said, geez this is a trustworthy product. One question here, JSON, still new, what is JSON?
Poulomi Damany:It’s Javascript Object Notation that’s the full form of it. It’s a way to represent a data structure. And so for a Java based application or a JavaScript written application, all the newer technologies, newer languages, how you represent the data is how you store it. So it becomes very easy for a developer to say, oh my structure looks like Robin Miller is the username, job is this, work share, and that exact structure gets written into the database as opposed to writing it in the old MySQL which would be, one table with the user, second with list of professions, the third one with their habits or something, right. So, JSON allows you to represent data in the way you want consume it rather than the way it needs to be stored for efficient storage.
Robin Miller:Okay, now you both work as executives, will it give them in text your positions at BitYota. How did you which is does nothing else but analytics, open source some or how do you distribute, open source company?
Dev Patel:We’re not an open source company, our technology is built by ourselves. There are aspects of our technology that we will look to open source in the next 12 months. We want to develop things and once we’re comfortable with the stability of certain aspects of our technology, we will open it up for more people to use, to make it easier for other people to develop on and so on, so that’s a phase of the company we will get into next year but not this year.
Robin Miller:Okay, so you’re not going with Eric S. Raymond's “release early and often,” but you’re doing the probably smarter corporate thing of waiting until you have comparatively clean code, is that right?
Dev Patel:Look, we do develop fast and release often just as a development practice within the company, but we want to get a degree of understanding of our own technology, maturity used by several, many customers and we look to open source several parts of our technology, whether all or not, I don’t know at this juncture.
Poulomi Damany:I mean I have these discussions about what to open source. I think the fundamental thing is, which is not open source technology, you have to have a problem that somebody wants to help you solve, right, so it has to be a big enough pain and people need to be challenged by it, and so you can build a community around a set of open source technology, otherwise it goes likeYahoo! Not everything got picked up. So, it’s sort of like a buyer’s market if you are an open source developer which is, I want to put my time and my talent towards things that are worthwhile and we want to find those before we say, here is our technology.
Robin Miller:Fair enough. How did the two of you come to this where you are working with basically searching and working with databases, how did you get started, what should somebody know if they want to follow in your footsteps?
Dev Patel:I mean, the first thing is, understand what problem you want to solve, big data is a massive word. Probably use sometimes, but it’s definitely bringing the world to an understanding that there is opportunities and nuggets in data. So, first figure out what problem you really want to solve. Two, have a team that has the ability to solve such a problem, but a founding team that has significant experience of solving big data infrastructure within the entire company.
Our CTO’s who is not on this call, not on this interview, is a database guru. He hails from the world of building database systems at Informatics and Oracle, and then he was part of Hadoop’s critical infrastructure development team for over three years before we got together and founded this company. Our other founder, co-founder is next in understanding how do we make real time infrastructure. So we have the core team supported by a lot of able engineers in building that infrastructure, so anybody who wants to build data technologies, understand the specific problem we’re going after, you got to make sure you have a bloody damn good team out there.
Robin Miller:Following their technology and working with it, I think that’s good advice for anybody in any part of IT.
Dev Patel:Correct.
Robin Miller:So, what do you look for, and this is for our younger audience, people that are getting started, what do you look for in a potential hire? I’m assuming not just your company, but that since big data is becoming useful, more useful and analyzing that data is a very big thing in getting bigger, there’s a lot of opportunities there, so how do you select the young engineer to come in who works with you?
Dev Patel: Well, Robin, we’re a startup and essentially there are two things we look for all the time. One is intellect and the other is attitude. With the right amount of intellect and huge amount of attitude, any new engineer or any new person coming into any startup, will really propel the start up to the next level. Everyone’s contribution is going to be a significant contribution. Again, that’s just the nature of any start up, it’s not just ours or big data stuff, that’s any startup. So looking for just those two qualities in a person and identifying in an interview very quickly and more importantly after the person is hired and showing that you are right in those two capacities is critical.
MongoDB (Score:2)
Re: (Score:1)
want to increase your value to the company?
find out the corporations bank account numbers and passwords, and who the top executives are banging when they aren't with their wives.
Real-Time for real? (Score:2)
I'm surprised BitYoga chose MongoDB for real-time analytics. Several years ago we attempted to do a real-time analytics solution with MongoDB but besides being a not so great performer when it comes to counting, it's boolean operators were still in its infancy. We ended up ripping out and replacing with another back-end solution in a couple of months and never looked back. Has MongoDB changed much to make real-time more realistic?
Re: (Score:1)
I'm surprised BitYoga chose MongoDB for real-time analytics. Several years ago we attempted to do a real-time analytics solution with MongoDB but besides being a not so great performer when it comes to counting, it's boolean operators were still in its infancy. We ended up ripping out and replacing with another back-end solution in a couple of months and never looked back. Has MongoDB changed much to make real-time more realistic?
Hi Dishwasha We don't use Mongo for analytics - it is an upstream store for transactional data for us. We are a Data Warehouse Service for Analytics and we enable fast SQL-based analytics in our system for data from Mongo and other JSON (semi-structured ) sources its fast because (a) you don't need to do ETL on your Mongo/JSON data before analysis - so you save that time and the temporal value of fresh data is preserved (b) we have a scale-out MPP architecture to add compute and storage as needed check u
Re: (Score:2)
Good that you clarified that. Because, if it's not DDS [rti.com], it's not really (hard) realtime.
Re: (Score:2)
Is it MongoDB's lack of joins that gives it the run-on sentence capability?
Yeah, right... (Score:3)
Right, because every place I've ever worked in IT, they've been totally transparent and forthcoming about finance, marketing, and investor relations to make the people in the trenches more valuable. Oh wait, no, that never happened....
remember (Score:2)
Re: (Score:2)
you must be interested in everything, especially the shit that we have to sell! be excited! be excited about everything! nothing less than your value as a person is at stake!
Devil's advocate: if you are not excited, why work there?
Valuable employee (Score:3)
Summary says
the more you know about functions in your company besides IT (such as finance, investor relations, and -- yes -- marketing), the more valuable you are as an employee.
Call them old-fashioned, but some employers actually prefer employees to focus on their area of expertise. If there is something to know about other fields, the employers has other experts that will tell what is needed.