Big data. No, that’s not quite big enough. Let’s adjust the reverb on the mic a bit… ahem: BIG DATA-A-A-A-A. There’s quite a bit of buzz these days about Big Data -some might argue it’s just buzz- but there’s relevant substance to the concept and intriguing propositions for the future. But what exactly constitutes Big Data? Big Data is in some ways a transient side effect of the information age – as the amount of data we accumulate, process, transmit, and store exponentially grows, the totality of said data has exceeded our capacity to conventionally manage it. The great invention of the information age: the almighty Relational Database Management System (RDBMS) is ill equipped to contend with Big Data. Databases store structured relational information in a centralized repository or a highly ordered federation of repositories. As the size of the dataset expands, so too must the size and power behind the database. Naturally, once the data exceeds a certain amazing colossal size, the corresponding database required to conventionally manage such a dataset exceeds reasonable practicality. Furthermore, as the diversity of the data increases, and the speed at which it is collected accelerates, any measure of structure becomes impossible. Even our favorite positronic golden boy is left scratching his head in bewilderment. The technology and innovation surrounding Big Data is very much about solving this particular problem. So how is this going to help us reroute power from the phaser banks to the starboard cappuccino machine? More importantly, what does Big Data have in store for engineering and, more specifically, Product Lifecycle Management (PLM)?
PLM and Big Data are often mentioned in the same conversation these days. However, PLM as we largely know it today is not Big Data. Red Alert. Even in the largest end-to-end implementations thus far PLM is, at best, medium data. Even a top-shelf project, take for instance the JPL Curiosity rover, is managing product data on the order of a couple of terabytes. Sure amassing millions of parts with hundreds of properties traveling through countless workflows, transactions, simulations and changes is a sizable challenge. But largely we’re still getting away with relational databases, most of which are wholly centralized, or closely coupled. PLM and Enterprise Resource Planning (ERP) anyone? So what then, does a Big Data problem really look like? Try the Large Hadron Collider (LHC) for instance, and the folks at Cern who are dealing with data on an entirely different scale. That was the stun setting. This is not.
“Sverre Jarp, chief technology officer at Cern, has the task of ensuring that the particle physics laboratory can access and analyse the 30 petabytes (that is 31,457,280GB) of data from the Large Hadron Collider data annually.
To put that into context, one year’s worth of Large Hadron Collider (LHC) data is stored on 83,000 physical disks. And it has 150 million sensors delivering data 40 million times per second. One gigabyte of data is sent per second.”
Cern’s task, says Jarp, is the equivalent of searching for one person across a thousand planets.”
For perspective, that’s the entire Curiosity rover product definition dataset in about seventeen minutes worth of sensor data. Sure, the LHC is an extreme example, but most Big Data problems are an at least an order or magnitude larger than the largest PLM attempts to date. Ebay and Walmart for example, are tackling problems in the petabyte range.
That’s not to say the information strategy at the core of PLM isn’t Big Data – it most certainly is, but that’s the vision. So far the manifestation of that vision is not yet at that scale. We’re certainly headed in that direction – is it at impulse power or warp factor nine? So what’s a relevant Big Data concept for PLM? One example often cited is aggregating customer data like social network activity, reviews, and usage patterns to tie product development into a tight feedback loop. But perhaps we’re still not thinking large enough…
In any given supply chain, today’s PLM largely exists at the top, the particular elements of which are often dictated down discrete supply chains. Data interchange occurs along predetermined interfaces with closely coupled architectures. From one company to another, PLM implementations are largely islands locked behind on-premise infrastructures and deep firewalls. We know that cloud technologies are challenging and transforming old infrastructures. So what if those PLM islands weren’t islands at all? What if each company could selectively contribute data into a larger PLM industrial super-network and in return have the ability to leverage Big Data insights across all participants? Imagine for example, investigating an off-the-shelf component in a product, but the knowledge of that component wasn’t just limited to recent data in your specific PLM or ERP system, but a holistic understanding across the entire open market.
It sounds like Utopian nonsense for today’s engineering paradigm, especially in light of concerns over protecting individual competitive information and intellectual property. However, the rise of social engineering might open the door. Even barring such an extreme transformation of industry, there could be huge potential in a Big Data ocean for PLM islands, providing a shared context to match supply with demand, problems with solutions, and foster reuse on a whole different level. So what do you think? Inevitable future or unworkable idealism?