Posted on

Big Data Patents (Digital Intellectual Property Law)

Article By Sandro Sandri 


Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy. The term “big data” often refers simply to the use of predictive analytics, user behaviour analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set.1 “There is little doubt that the quantities of data now available are indeed large, but that’s not the most relevant characteristic of this new data ecosystem.”

In another way Big Data is an evolving term that describes any voluminous amount structured, semistructured and unstructured data that has the potential to be mined for information. It is often characterized by 3Vs: the extreme Volume of data, the wide Variety of data types and the Velocity at which the data must be processed. Although big data doesn’t equate to any specific volume of data, the term is often used to describe terabytes, petabytes and even exabytes of data captured over time. The need for big data velocity imposes unique demands on the underlying compute infrastructure. The computing power required to quickly process huge volumes and varieties of data can overwhelm a single server or server cluster. Organizations must apply adequate compute power to big data tasks to achieve the desired velocity. This can potentially demand hundreds or thousands of servers that can distribute the work and operate collaboratively. Achieving such velocity in a cost-effective manner is also a headache. Many enterprise leaders are reticent to invest in an extensive server and storage infrastructure that might only be used occasionally to complete big data tasks. As a result, public cloud computing has emerged as a primary vehicle for hosting big data analytics projects. A public cloud provider can store petabytes of data and scale up thousands of servers just long enough to accomplish the big data project. The business only pays for the storage and compute time actually used, and the cloud instances can be turned off until they’re needed again. To improve service levels even further, some public cloud providers offer big data capabilities, such as highly distributed Hadoop compute instances, data warehouses, databases and other related cloud services. Amazon Web Services Elastic MapReduce is one example of big data services in a public cloud.

Ultimately, the value and effectiveness of big data depends on the human operators tasked with understanding the data and formulating the proper queries to direct big data projects. Some big data tools meet specialized niches and allow less technical users to make various predictions from everyday business data. Still, other tools are appearing, such as Hadoop appliances, to help businesses implement a suitable compute infrastructure to tackle big data projects, while minimizing the need for hardware and distributed compute software know-how.


The General Data Protection Regulation, which is due to come into force in May 2018, establishes a few areas that have been either drafted with a view to encompass Big Data-related issues or carry additional weight in the context of Big Data, lets analyse just two aspects.

– Data processing impact assessment

According to the GDPR, where a type of processing in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons, the controller shall, prior to the processing, carry out an assessment of the impact of the envisaged processing operations on the protection of personal data. This criterion is most likely going to be met in cases of Big Data analytics, IoT or Cloud operations, where the processing carries high privacy risks due to the properties of either technology or datasets employed. For example, linking geolocation data to the persons name, surname, photo and transactions and making it available to an unspecified circle of data users can expose the individual to a higher than usual personal safety risk. Involving data from connected IoT home appliances or using a Cloud service to store and process such data is likely to contribute to this risk.

– Pseudonymisation


According to the GDPR, ‘pseudonymisation’ means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person. At least two aspects link pseudonymisation to Big Data. First, if implemented properly, it may be a way to avoid the need to obtain individual consent for Big Data operations not foreseen at the time of data collection. Second, paradoxically, Big Data operations combining potentially unlimited number of datasets also makes pseudonymisation more difficult to be an effective tool to safeguard privacy.


Big data has increased the demand of information management specialists so much so that Software AG, Oracle Corporation, IBM, Microsoft, SAP, EMC, HP and Dell have spent more than $15 billion on software firms specializing in data management and analytics. In 2010, this industry was worth more than $100 billion and was growing at almost 10 percent a year: about twice as fast as the software business as a whole. Developed economies increasingly use data-intensive technologies. There are 4.6 billion mobile-phone subscriptions worldwide, and between 1 billion and 2 billion people accessing the internet. Between 1990 and 2005, more than 1 billion people worldwide entered the middle class, which means more people became more literate, which in turn lead to information growth. The world’s effective capacity to exchange information through telecommunication networks was 281 petabytes in 1986, 471 petabytes in 1993, 2.2 exabytes in 2000, 65 exabytes in 20073 and predictions put the amount of internet traffic at 667 exabytes annually by 2014. According to one estimate, one third of the globally stored information is in the form of alphanumeric text and still image data, which is the format most useful for most big data applications. This also shows the potential of yet unused data (i.e. in the form of video and audio content).

2 “Data, data everywhere”. The Economist. 25 February 2010. Retrieved 9 December 2012.

3 Hilbert, Martin; López, Priscila (2011). “The World’s Technological Capacity to Store, Communicate, and Compute Information”. Science. 332 (6025): 60-65. doi:10.1126/science.1200970. PMID 21310967.


While many vendors offer off-the-shelf solutions for big data, experts recommend the development of in-house solutions custom-tailored to solve the company’s problem at hand if the company has sufficient technical capabilities.


A patent is a set of exclusive rights granted by a sovereign state to an inventor or assignee for a limited period of time in exchange for detailed public disclosure of an invention. An invention is a solution to a specific technological problem and is a product or a process. Being so, Patents are a form of intellectual property.

A patent does not give a right to make or use or sell an invention.5 Rather, a patent provides, from a legal standpoint, the right to exclude others from making, using, selling, offering for sale, or importing the patented invention for the term of the patent, which is usually 20 years from the filing date6 subject to the payment of maintenance fees. From an economic and practical standpoint however, a patent is better and perhaps more precisely regarded as conferring upon its proprietor “a right to try to exclude by asserting the patent in court”, for many granted patents turn out to be invalid once their proprietors attempt to assert them in court.7 A patent is a limited property right the government gives inventors in exchange for their agreement to share details of their inventions with the public. Like any other property right, it may be sold, licensed, mortgaged, assigned or transferred, given away, or simply abandoned.

The procedure for granting patents, requirements placed on the patentee, and the extent of the exclusive rights vary widely between countries according to national laws and international agreements. Typically, however, a granted patent application must include one or more claims that define the invention. A patent may include many claims, each of which defines a specific property right.

4 WIPO Intellectual Property Handbook: Policy, Law and Use. Chapter 2: Fields of Intellectual Property Protection WIPO 2008

A patent is not the grant of a right to make or use or sell. It does not, directly or indirectly, imply any such right. It grants only the right to exclude others. The supposition that a right to make is created by the patent grant is obviously inconsistent with the established distinctions between generic and specific patents, and with the well-known fact that a very considerable portion of the patents granted are in a field covered by a former relatively generic or basic patent, are tributary to such earlier patent, and cannot be practiced unless by license

thereunder.” – Herman v. Youngstown Car Mfg. Co., 191 F. 579, 584-85, 112 CCA 185 (6th Cir. 1911)

6 Article 33 of the Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS).

7 Lemley, Mark A.; Shapiro, Carl (2005). “Probabilistic Patents”. Journal of Economic Perspectives, Stanford Law and

Economics Olin Working Paper No. 288. 19: 75.


relevant patentability requirements, such as novelty, usefulness, and non-obviousness. The exclusive right granted to a patentee in most countries is the right to prevent others, or at least to try to prevent others, from commercially making, using, selling, importing, or distributing a patented invention without permission.

Under the World Trade Organization’s (WTO) Agreement on Trade-Related Aspects of Intellectual Property Rights, patents should be available in WTO member states for any invention, in all fields of technology,9 and the term of protection available should be a minimum of twenty years.10 Nevertheless, there are variations on what is patentable subject matter from country to country.


European patent law covers a wide range of legislations including national patent laws, the Strasbourg Convention of 1963, the European Patent Convention of 1973, and a number of European Union directives and regulations in countries which are party to the European Patent Convention. For certain states in Eastern Europe, the Eurasian Patent Convention applies.

Patents having effect in most European states may be obtained either nationally, via national patent offices, or via a centralised patent prosecution process at the European Patent Office (EPO). The EPO is a public international organisation established by the European Patent Convention. The EPO is not a European Union or a Council of Europe institution.[1] A patent granted by the EPO does not lead to a single European patent enforceable before one single court, but rather to a bundle of essentially independent national European patents enforceable before national courts according to different national legislations and procedures.[2] Similarly, Eurasian patents are granted by the Eurasian Patent Office and become after grant independent national Eurasian patents enforceable before national courts.

8 Lemley, Mark A.; Shapiro, Carl (2005). “Probabilistic Patents”. Journal of Economic Perspectives, Stanford Law and Economics Olin Working Paper No. 288. 19: 75. doi:10.2139/ssrn.567883.

9 Article 27.1. of the TRIPs Agreement.

10 Article 33 of the Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS).


European patent law is also shaped by international agreements such as the World Trade Organization’s Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPs Agreement), the Patent Law Treaty (PLT) and the London Agreement.


11 Patent Analytics Solutions That Help Inventors Invent”, Outsell Inc, June 3 2016

Patent data is uniquely suited for big data tools and techniques, because of the high volume, high variety (including related information) and high velocity of changes. In fact, patents are leading the way with big data and analytics in many ways. “The patent space offers a fascinating insight into the potential of big data analytics, rich visualization tools, predictive and prescriptive analytics, and artificial intelligence”.11 Especially recently, big data tools and technologies are being used in several ways in the patent world to transform and improve patent analysis.

Patents and Intellectual Property are gradually gaining significance around the world. This is leading to a bottleneck-large databases and ever growing information. A new way around the innovation problem is to acquire patents. With examples such as Nokia, Motorola, Twitter, the patent purchases seem rather straightforward. Nokia sold a large chunk of its company to Microsoft, but held on to the crucial patents by signing a licensing deal. They can now earn a revenue using patents licensed to Microsoft. Google bought Motorola and its patents and later sold the company to Lenovo while holding on to the patents. There are ample such examples in the industry.

Transactions of Intellectual Property (IP) are rather complex. Per example, a basic component to be verified before a patent is granted, is novelty. In other words, if a priorart describing the invention is found, the application stands to be rejected. A prior-art could be in the form of a publication, a blog post, a lecture, a video, or a book. With a massive amount of information generated, that doubles every 18 months, it is extremely difficult to found prior-art. One way, some organizations follow, is crowdsourcing the prior art search. Details about the patent are published on a website asking IP professionals from around the world to find a prior-art. The emergence of Big Data analytics, on the other hand, has provided a clear solution. In addition, the outcomes through this method get better and precise with each operation.

Since Big Data analytics is still not commonly used by most government authorities, prior-art gets overlooked and many false patents are granted. This comes out when-in litigation-the opposing parties put all their efforts in looking for a prior-art to invalidate each other’s patents. More often than not, a prior-art is found or there is an out of court settlement. Hence, a concept called patent wall has gained traction. It is very common for companies to file as well as acquire a number of patents around the technology they are working on. This serves as a defence against litigators and allows the companies to market and sell their products/services without any fear of litigation.

The core value of patents is that the invention must be publicly disclosed in exchange for a time-limited monopoly on the invention. Patents are not only a legal asset that can block competitors, they are potentially a business and financial asset. For market participants, patents can provide direct insight into where competitors are headed strategically.

Big Data is the key to unlocking this inherent value. Patent information is comprised of vast data sets of textual data structures involving terabytes of information. When unlocked through Big Data techniques and analysis, the insights are compelling, revealing the direction a technology is headed and even uncovering the roadmap for a specific company’s product plans. But, deriving these insights from the proliferation of information requires truly sophisticated Big Data analysis.

While Big Data is quickly growing as a trend, what’s delivering more value these days are Big Data services that optimize specific data sets and create specialized analysis tools for that data. Technology teams that are dedicated to certain data sets will curate and improve the data, learn the specifics of that data and how best to analyze it, and create selfservice tools that are far more useful than generic Big Data technologies.

A key part of the Big Data service is a specialized analysis engine tailored to particular data. For example, a patent analysis engine must understand the dozens of metadata items on each patent in order to group patents correctly and traverse the references. To be most effective, Big Data services need to automatically keep up with the data updates, as patents are living documents that change over time. Even after the patent Big Data Patents is finalized and issued, it can be reclassified, assigned to a new owner, reexamined and updated, attached to a patent family or abandoned.

Most importantly, Big Data services are only as good as the insights they deliver – a Big Data service should provide a specialized user interface that allows real-time, userdriven analysis with search, correlations and groupings, visualizations, drill down and zooms. The patent data analysis must be presented in a manner that is compelling and consistent.

There are more than 22,000 published patent applications between 2004 and 2013 relating to big data and efficient computing technologies, resulting in almost 10,000 patent families. Patenting activity in this field has grown steadily over the last decade and has seen its highest increases in annual patenting over the last two years (2011-2012 and 2012-2013) of the present data set. The growth has continually been above the general worldwide increase in patenting, showing a small increase of 0.4% over worldwide patenting for the 2005-2006 period and showing a maximum increase of 39% for 2012-13.~

“Using” a patent effectively means suing a competitor to have them blocked access to market, or charge them a license for allowing them to sell. When a patent holder wishes to enforce a patent, the defendant often can invoke that the patent should not have been granted, because there was prior art at the time the patent was granted. And, while patent offices do not seem to have a clear incentive to take into account actual reality, including the exponentially available information created by Big Data, when reviewing the application, the situation is very different for a defendant in a patent lawsuit. They will have every incentive to establish that the patent should never have been granted, because there was pre-existing prior art, and the information in the patent was not new at the time of application. And one important consequence of Big Data will be that the information available to defendants in this respect, will also grow exponentially. This means that, the probability of being able to defend against a patent claim on the basis of prior art, will grow significantly. Because of the lag of time between patent applications and their use in court, the effect of the recent explosion of information as a result of Big Data is not very visible in the patent courts yet.

A patent is, of itself, an algorithm. It describes the process of a technical invention – how it works (at least, that’s what a patent is theoretically supposed to be doing). It is therefore quite possible that a lot of algorithms around analysis of Big Data will become patented themselves. It could be argued that this will act as a counterweight against the declining value and potential of patents.

Many of these algorithms are, in fact, not technical inventions. They are theoretical structures or methods, and could therefore easily fall into the area of non-patentable matter. Algorithmic patents are particularly vulnerable to the ability by others to “innovate” around them. It is quite unlikely that a data analysis algorithm would be unique, or even necessary from a technical point of view. Most data analysis algorithms are a particular way of doing similar things, such as search, clever search, and pattern recognition. There is, in actual fact, a commoditization process going on in respect of search and analytical algorithms. Patents are “frozen” algorithms. The elements of the algorithm described in a patent are fixed. In order to have a new version of the algorithm also protected, the patent will either have to be written very vague (which seriously increases the risk of rejection or invalidity) or will have to be followed up by a new patent, every time the algorithm is adapted. And the key observation around Big Data algorithms is that, in order to have continued business value, they must be adapted continuously. This is because the data, their volume, sources and behaviour, change continuously.

The consequence is that, even if a business manages to successfully patent Big Data analytical algorithms, such patent will lose its value very quickly. The reason is simple: the actual algorithms used in the product or service will quickly evolve away from the ones described in the patent. Again, the only potential answer to this is writing very broad, vague claims – an approach that does not work very well at all.

80% of all big data and efficient computing patent families (inventions) are filed by US and Chinese applicants, with UK applicants accounting for just 1.2% of the dataset and filing slightly fewer big data and efficient computing patents than expected given the overall level of patenting activity from UK applicants across all areas of technology.

Against this, however, it should be borne in mind that many of the potential improvements in data processing, particularly with regard to pure business methods and computer software routines, are not necessarily protectable by patents and therefore will not be captured by this report. UK patenting activity in big data and efficient computing has, on the whole, increased over recent years and the year-on-year changes are comparable to the growth seen in Germany, France and Japan.12

12 Intellectual Property Office, Eight Great Technologies Big Data A patent overview



ï‚· Herman v. Youngstown Car Mfg. Co., 191 F. 579, 112 CCA 185 (6th Cir. 1911)

ï‚· Hilbert, Martin; López, Priscila (2011). “The World’s Technological Capacity to

Store, Communicate, and Compute Information”. Science. (6025).

ï‚· Lemley, Mark A.; Shapiro, Carl (2005). “Probabilistic Patents”. Journal of

Economic Perspectives, Stanford Law and Economics Olin Working Paper No.


ï‚· Springer, New Horizons for a Data-Driven Economy –

ï‚· “Data, data everywhere”. The Economist. 25 February 2010. Retrieved 9

December 2012.

ï‚· Eight Great Technologies Big Data – A patent overview, Intellectual Property


ï‚· “Patent Analytics Solutions That Help Inventors Invent”, Outsell Inc, June 3 2016

ï‚· Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS).

ï‚· Article 33 of the Agreement on Trade-Related Aspects of Intellectual Property

Rights (TRIPS).

ï‚· 75. doi:10.2139/ssrn.567883.

ï‚· TRIPs Agreement.

ï‚· WIPO Intellectual Property Handbook: Policy, Law and Use. Chapter 2: Fields of

Intellectual Property Protection WIPO 2008