Big Amazing Colossal PLM Data

BigDataBig data. No, that’s not quite big enough. Let’s adjust the reverb on the mic a bit… ahem: BIG DATA-A-A-A-A. There’s quite a bit of buzz these days about Big Data -some might argue it’s just buzz- but there’s relevant substance to the concept and intriguing propositions for the future. But what exactly constitutes Big Data? Big Data is in some ways a transient side effect of the information age – as the amount of data we accumulate, process, transmit, and store exponentially grows, the totality of said data has exceeded our capacity to conventionally manage it. The great invention of the information age: the almighty Relational Database Management System (RDBMS) is ill equipped to contend with Big Data. Databases store structured relational information in a centralized repository or a highly ordered federation of repositories. As the size of the dataset expands, so too must the size and power behind the database. Naturally, once the data exceeds a certain amazing colossal size, the corresponding database required to conventionally manage such a dataset exceeds reasonable practicality. Furthermore, as the diversity of the data increases, and the speed at which it is collected accelerates, any measure of structure becomes impossible. Even our favorite positronic golden boy is left scratching his head in bewilderment. The technology and innovation surrounding Big Data is very much about solving this particular problem. So how is this going to help us reroute power from the phaser banks to the starboard cappuccino machine? More importantly, what does Big Data have in store for engineering and, more specifically, Product Lifecycle Management (PLM)?

PLM and Big Data are often mentioned in the same conversation these days. However, PLM as we largely know it today is not Big Data. Red Alert. Even in the largest end-to-end implementations thus far PLM is, at best, medium data. Even a top-shelf project, take for instance the JPL Curiosity rover, is managing product data on the order of a couple of terabytes. Sure amassing millions of parts with hundreds of properties traveling through countless workflows, transactions, simulations and changes is a sizable challenge. But largely we’re still getting away with relational databases, most of which are wholly centralized, or closely coupled. PLM and Enterprise Resource Planning (ERP) anyone? So what then, does a Big Data problem really look like? Try the Large Hadron Collider (LHC) for instance, and the folks at Cern who are dealing with data on an entirely different scale. That was the stun setting. This is not.

“Sverre Jarp, chief technology officer at Cern, has the task of ensuring that the particle physics laboratory can access and analyse the 30 petabytes (that is 31,457,280GB) of data from the Large Hadron Collider data annually.

To put that into context, one year’s worth of Large Hadron Collider (LHC) data is stored on 83,000 physical disks. And it has 150 million sensors delivering data 40 million times per second. One gigabyte of data is sent per second.”

Cern’s task, says Jarp, is the equivalent of searching for one person across a thousand planets.”

For perspective, that’s the entire Curiosity rover product definition dataset in about seventeen minutes worth of sensor data. Sure, the LHC is an extreme example, but most Big Data problems are an at least an order or magnitude larger than the largest PLM attempts to date. Ebay and Walmart for example, are tackling problems in the petabyte range.

That’s not to say the information strategy at the core of PLM isn’t Big Data – it most certainly is, but that’s the vision. So far the manifestation of that vision is not yet at that scale. We’re certainly headed in that direction – is it at impulse power or warp factor nine? So what’s a relevant Big Data concept for PLM? One example often cited is aggregating customer data like social network activity, reviews, and usage patterns to tie product development into a tight feedback loop. But perhaps we’re still not thinking large enough…

In any given supply chain, today’s PLM largely exists at the top, the particular elements of which are often dictated down discrete supply chains. Data interchange occurs along predetermined interfaces with closely coupled architectures. From one company to another, PLM implementations are largely islands locked behind on-premise infrastructures and deep firewalls. We know that cloud technologies are challenging and transforming old infrastructures. So what if those PLM islands weren’t islands at all? What if each company could selectively contribute data into a larger PLM industrial super-network and in return have the ability to leverage Big Data insights across all participants?  Imagine for example, investigating an off-the-shelf component in a product, but the knowledge of that component wasn’t just limited to recent data in your specific PLM or ERP system, but a holistic understanding across the entire open market.

It sounds like Utopian nonsense for today’s engineering paradigm, especially in light of concerns over protecting individual competitive information and intellectual property. However, the rise of social engineering might open the door. Even barring such an extreme transformation of industry, there could be huge potential in a Big Data ocean for PLM islands, providing a shared context to match supply with demand, problems with solutions, and foster reuse on a whole different level. So what do you think? Inevitable future or unworkable idealism?

  • A thoughtful post !

    Currently, all PLM applications available in the market are only dealing with “centralized repository of product data”. When the product data volume and diversity grow beyond the manageable limit of the conventional RDBMS database models, many of the PLM software provides will struggle to support the business needs. Hence, PLM vendors should adopt and innovate the “Big Data” and “Internet Of Things”
    technologies very soon. However, I feel the real challenge before technology
    providers will be “Real time processing” of the vast amount of Big date being generated from hundreds of machines/devices or social media and converting that into useful business decisions.

    The idea of connecting different PLM Islands by selectively sharing product information to create industry specific “larger PLM industrial super-network” is an excellent one and the way forward for the Engineering and manufacturing world. It is inevitable future!

    – Jayakumar
    https://www.facebook.com/FaceOfPLM

    • Jayakumar, thanks for your input! It will be interesting to follow how this plays out in the PLM market, especially with cloud disruption.

  • There is no such thing as “Big Data” in engineering just small pieces that look big when mixed up in one big pot. The problem is that very few are even interested in this big pot, they are only interested in the small pieces.

  • pgarrish

    I agree with Joe that during design and manufacturing there is ‘little’ data in PLM – even the biggest CAD library is small in this sort of scale.

    After delivery its a different ball-game. Real-time sensors in jet engines, pump and plant monitoring etc, can all generate data in the TB volume which gets into the realm of Big(ish) data.

    For me, the interesting stuff is the search capability that Big Data has developed – unstructured search of document contents, even images and videos; searching of text and trees that is lightyears ahead of what is currently in the RDBMS’s underpinning PLM systems. Searching and exploring large product structures (10’s of thousands+ parts) is painful in most tools whereas big data tech can handle hundreds of millions and ‘brute force’ search and analyse even free text data.

    I think the technology to do your federated library is eminently possible. The legal and commercial issues are a different animal. Now if a customer (say NATO, DoD, MoD) mandated it…… that could be interesting 😉

    • Paul, Joe, thanks for contributing to the discussion!
      An interesting perspective with search – opening up the contents of each piece part, document, and representation in PLM such that search can operate independent of any pre-planned taxonomy and/or classification. That’s certainly a big data problem!

      I agree that a federated concept will have difficulty in traditional industries, unless mandated by government customers as part of a larger effort to revolutionize acquisition and supply chain. Considering the troubling results of intra-service “joint” programs, I don’t see this manifesting in A&D at all, despite the collective benefit.

      Now in terms of social engineering – if it really has legs, I can imagine the need to quickly discover and “mash up” disparate design assets across a wide breath of domains and suppliers. Given a voluntary system where participants receive benefits in return for sharing data, I could see this taking off – if social engineering does as well.

  • Pingback: Turning Reinvented PLM Wheels into Rainbows | E(E)()