Getting to Big Data


Getting to Big Data - Rackcdn.comhttps://dab35129f0361dca3159-2fe04d8054667ffada6c4002813eccf0.ssl.cf1.rackcdn.c...

3 downloads 177 Views 75KB Size

Getting to Big Data The 4 Ss that deliver Big Data’s promise

Google Plus

DATA, DATA EVERYWHERE

Perhaps the biggest thing about Big Data is the size of the world’s expectations. Everyone’s heard some version of the ‘beer and nappies’ story and now we want to find dramatic, unexpected insights of our own. The good news: there really is a bright, data-powered, insight-rich future ahead of us and Big Data really is the bullet train that will get us there. The not-so-good news: most companies are simply not set up to realise the benefits of Big Data. Their IT infrastructures and analytics stacks just weren’t built for the job.

This gap between the Big Data promise and Big Data performance is the first Big Data ‘problem’. If you’re going to make it to that big, bright, Big Data future, you’re going to need to define a new approach to how you deal with data. In this eBook we outline what that approach will look like, what changes it will entail and the four defining dimensions for success.

Getting to Big Data

GETTING MESSY WHY A NEW APPROACH IS NEEDED TO HANDLE BIG DATA The traditional way of dealing with data is about carefully crafted ‘records’ and detailed ‘schemas’ that map out exactly what the data is and what should be expected from it. But when it comes to Big Data, things start to get messy. It’s estimated that just 5% of the data available is actually structured – meaning the overwhelming majority of data is an unstructured mess.

When we stick to the neat and tidy approach to data, we ignore the 95% of unstructured and multi-structured information (things like websites, videos and huge bodies of text) we have access to.

Making messy work

On the other hand, when we accept the messiness and wade through the jaw-dropping amount of unstructured data out there, we can make connections and find insights we couldn’t possibly have noticed if we confined ourselves to the 5%.

We might lose ‘consistency’ (as defined in CAP theorem) and our calculations may be approximate, but they’ll still give us hugely valuable business intelligence.

In this new paradigm of inexactness, flexibility is the critical dimension in making sure businesses get the most out of Big Data.

“WE CAN NO LONGER PRETEND TO LIVE IN A CLEAN WORLD.” PAT HELLAND, MICROSOFT “WHEN YOU HAVE TOO MUCH DATA, ‘GOOD ENOUGH’ IS GOOD ENOUGH”

Getting to Big Data

ACHIEVING AGILE BUSINESSES TODAY LITERALLY HAVE ACCESS TO MORE DATA THAN THEY KNOW WHAT TO DO WITH. Social data, purchase histories, cookie data, user profiles – there’s more than enough data to be improving processes and understanding customers. The problem, and most CTOs know this all too well, is that most organisations’ existing analytics stacks were never designed for the challenges of the noisy, unstructured world of big data. Re-engineering for Big Data means a new approach that touches on people, processes and technology:

Getting to Big Data

ACHIEVING AGILE

PEOPLE: JOINING FORCES

PROCESS: GETTING AGILE

TECHNOLOGY: BEYOND SQL

On the one hand, your people will have to adapt to a new mind-set when they start working with Big Data. They’ll have to learn to accept that the traditional expectations of structured data won’t really apply any more.

Agile planning and deployment will see your processes become more flexible and your developers more productive. The move to DevOps plus an agile methodology is a killer combination in any Big Data initiative – and helps make sure the smartest people in your team aren’t just doing dishes.

There’s great value in relational databases and pre-defined data schemas. But when it comes to Big Data, we can’t afford to be limited by what we know or what we think we’ll need to ask.

But there’s also a more functional change they’ll be adapting to. A Big Data approach will entail the move from a traditional database admin working alone, to a developer and a data scientist working in tandem.

An agile mindset also reflects the iterative nature of Big Data analysis. Traditional analytics are engineered to a plan. Big Data analytics explores.

NoSQL databases, like Redis and MongoDB, mean you can develop without knowing what the data’s going to be or what questions you’re likely to ask. And distributed data stores for unstructured data like Hadoop mean you don’t have to define data schemas so you don’t have to worry about making changes when you learn something. This makes it feasible to be inquisitive and iterative – essential Big Data qualities.

Getting to Big Data

THE 4 Ss IN SUCCESS LIKE THE FOUR V’S (VOLUME, VARIETY, VELOCITY AND VERACITY) THAT DEFINED THE CHALLENGES OF DATA GROWTH, THERE ARE FOUR DIMENSIONS THAT BUSINESSES MUST MASTER TO MAKE SURE THEIR SOLUTIONS ARE HIGH PERFORMING, RELIABLE AND EASY TO MANAGE.

SCALABILITY is about the ability to work in a horizontally scaling environment instead of a vertically scaling set up.

SAFETY is about having reliable solutions that protect data and are constantly available to customers.

SPEED is about your applications maintaining the same high performance over sustained periods of heavy traffic and use.

SUPPORT is about having constant access to experts who live and breathe the essential Big Data technology like MongoDB.

Getting to Big Data

1. [S]CALABILITY MAKING GROWTH (AND SPIKES) EASIER TO MANAGE Let’s say you’re running a data processing system that crunches 80 million events a day. If your business grows (as you wish it would), that could be eight billion events a day in just a matter of months. Big Data is only going to get bigger and more complex. And a business’ ability to understand its customers and markets is intrinsically linked to its ability to process more data about them.

Apple famously demanded data from operators as part of its contracts with them. It did so because the data from these operators would give them a richer picture of how its customers were using their devices. Data growth spurts Seeking out more data directly enabled Apple to create better products that reflected actual consumer usage patterns. And businesses that want to thrive in the world of Big Data will be greedy about consuming more data.

To make sure all this data doesn’t overwhelm your resources, you’ll have to develop the skills needed to manage frequent growth spurts. Capacity traps

“IT’S BETTER TO HAVE INFINITE SCALABILITY AND NOT NEED IT, THAN TO NEED INFINITE SCALABILITY AND NOT HAVE IT” Adam Clay Shaffer (@littleidea) co-founder of Puppet Labs

On the one hand, this means the ability to scale horizontally and make the most of a distributed environment to extract value from your data. On the other hand, it means managing your processing and storage capacity more intelligently and cost-effectively. You don’t want to inhibit your organisation’s capacity to learn because of a limited capacity to store and process data.

Getting to Big Data

2. [S]PEED OPTIMISING YOUR PERFORMANCE TO MATCH EXPECTATIONS There was a time when many felt that the biggest challenge for Big Data initiatives were a matter of storage. But increasing storage capacity actually looks relatively easy now that you can do it without breaking the bank.

The closer you get to real-time, the more important highperformance analytics become.

The real challenge lies in being able to analyse all this data at speed. Until then, it’s just stuff on a disk.

Without great performance, a killer idea in code isn’t going to realise its full potential – because it won’t meet customer expectations in practice. And with fast scaling apps and solutions, there is often a concern that the end user experience might suffer.

This doesn’t just mean the kind of analytics that power business intelligence functions. It means the moment-by-moment analyses that power today’s high-performing apps (from games and social media to location check-ins and mobile apps).

The Big Data performance ceiling is a bottom line business limitation.

Clearly, you need high performance over a sustained period. In the context of NoSQL databases, this means there’s a pressing need for speed and consistency in storage, such as Flash and Solid State Disks (SSD). While many companies can’t afford this storage themselves, the ‘shared model’ inherent in fully managed services (like ours) makes them affordable. All you need is a Big Data partner who’s invested in speed.

It doesn’t matter what you’re doing in the back end if customers have to deal with fluctuating performance levels.

Getting to Big Data

3. [S]AFETY DELIVERING AVAILABILITY AND SECURITY The bigger data gets, the harder it becomes to protect it. But if you’re going to thrive as a Big Data business, you can’t afford to make that trade-off. Even the fastest, smartest, most innovative service in the world will count for little if it struggles to stay secure and stay available.

More importantly, he appreciated the competitive advantage being ‘always on’ afforded his company. The more successful you are, the more pressure you’ll be under to deal with your availability issues.

Always secure

IN ORDER TO DO SO IT’LL TAKE:

Always on

Multiple hot copies of your data with daily back ups so there’s always something to work with

From your users’ standpoint, it reflects frailties in a system then don’t really understand. So expecting them to conclude that ‘these things happen’ is a bad idea. Instead, they’re likely to presume that they can’t trust your organisation and its products. And a bad reputation is a bug that can’t be fixed.

You might recall the early days of Facebook when Mark Zuckerburg famously insisted that they would never let their site crash. He understood then that availability issues undermine all the hard work being done by the rest of the company.

Multiple redundant links to the internet so you’re always connected Properly encrypted communications between your application and the database

Security breaches, no matter how small, are a very big deal. When the breach is unexpected this can be particularly daunting because it implies a lack of awareness about your architecture.

Managing the availability and security concerns in a Big Data world is one of your biggest challenges going forward. But it’s a challenge that pays off for those that get it right.

Getting to Big Data

4. [S]UPPORT ACCESSING THE EXPERTISE NEEDED TO GET AHEAD New disciplines always demand new knowledge. But that knowledge can often be in short supply – and therefore expensive – until the discipline is adopted widely enough. This tends to be a significant roadblock for companies trying to make the most of the greenfields ahead of them, because they don’t really know how to administer new systems effectively. And by the time they learn how to, it won’t confer them any real competitive advantage any more.

When open source roots meet business realities When technology has its roots deep in open source, developers tend to start with a DIY approach. They take on board the open source ethos of systems like MongoDB and look to iterate their way to success. For relatively straightforward, non mission-critical deployments this can be a lot of fun. You can spend your time getting things right and wait for community support if things aren’t working out.

But when it’s an active application and it’s customer facing, it’s hard to justify all the waiting and iterating without delivering clear results. Why support is critical – and expensive

It’s easy to think that none of this matters. That you can plough through without any support. But with the right people and support in place, you can spend more time doing what really matters: creating great applications and growing your business.

Database administration isn’t just expensive in terms of time, the people required to fulfil these roles usually charge a premium – often six figures large – and it’s only fair that they do. It takes proactive monitoring to improve the application consistently and integrate the insights you’re learning from your analyses of all that data. But you don’t want to have to build a 24x7x365 support team in house. Getting to Big Data

NO EXCUSES LEFT

Big Data is here and the promise is well within the reach of every enterprise and any analytics team. The opportunity is huge Considering how many data sources you have, traditional business intelligence delivers just a fraction of the insight actually available. Once you start cleaning, combining and correlating different data sources, you’re going to see all sorts of things. Every aspect of a business’ decision-making process will benefit from having Big Data answer the questions that, so far, we’ve either relied on hunches to answer – or not even asked at all.

The competition isn’t stuck in relational While older businesses might feel somewhat constrained by the fact that so much of their technology is rooted in relational databases, newer competitors are busy making the most of NoSQL. The fact that younger apps can make updates on an almost daily basis without harming their availability or performance is a massive advantage for them and a critical differentiator for their customers.

It’s all greenfields

Full speed ahead

When you have access to the right support, it’s really quite incredible how much more businesses are willing to iterate and create new processes that might well become best practice tomorrow.

The benefits of NoSQL, are so great, that there are really only two choices: take a DIY approach or find a great partner who can deliver the 4 Ss.

And the early adopters really have the greatest advantage in being able to find the best way to use the agility and flexibility these databases afford them.

(One’s extremely expensive and the other comes with fanatical support). We should talk.

Getting to Big Data

FEELING INSPIRED? WE’RE HERE TO MAKE YOUR BIG DATA AMBITIONS A REALITY. At Rackspace we’re completely dedicated to helping organisations make the most of Big Data’s promise. We’re big believers in the possibilities of NoSQL databases and we want to help businesses like yours achieve the 4 Ss. We deliver this with market leading products like ObjectRocket for MongoDB, and our partnership with Hortonworks for Hadoop. You can find more useful resources on the next page, or give our Fanatical Support team a call any time on +44 (0)20 8712 6538

Getting to Big Data

FURTHER RESOURCES

More Cloud Resources eBook: The Disruptive Cloud: How cloud platforms change innovation dynamics SlideShare: Spikes: Why scaling platforms with cloud is the secret to customer success

Share Twitter Facebook LinkedIn Google Plus

eBook: Web Experts: Learn why cloud remains the technology for 2014 SlideShare: The Power of the Hybrid Cloud: A cautionary tale for IT folks and innovators

Getting to Big Data