Privacy by Design
How baking privacy controls into data-gathering analytic solutions can benefit everyone
IBM Fellow Jeff Jonas says G2 not only helps an organization make more sense of its data, it also helps protect that data. Photography by Jacob Kepler
When people sign up for an online service, they rarely read the terms of agreement, which is understandable because usually the terms are full of legalese and are the length of a novel. However, people might be more wary of clicking the “I Understand and Accept the Terms” button if they understood how their data was being gathered, shared and used. This is especially true as new technologies make it easier to harvest and parse personal information.
Thankfully, privacy organizations are now encouraging companies to come clean about what they do with customers’ personal information. But without legal teeth behind these efforts, it may be all for naught. That is, unless people like Jeff Jonas, IBM Fellow and chief scientist of entity analytics for IBM Software Group, can further the goal of baking the concept of privacy by design into new solutions.
Jonas’ G2, or his “big data analytic sensemaking” engine, as he calls it, can be a more benevolent solution that allows organizations to become increasingly competitive by making sense of data while also securing the privacy of the resulting knowledge.
Q. Could you tell us about G2 and its benefits?
A. Organizations want to be more competitive, and to do so they have to be able to spot opportunities and risks faster than their competitors—and make quicker sense of what they’re observing. G2 is able to take data from different sources—which are like senses, such as hearing and seeing and touching—but in this case are different systems and sensors. It weaves together the data an organization is observing—an organization’s observation space—and uses this to construct a model of how it thinks people and things, such as companies, are related to each other. It then uses this to make high-quality predictions. These predictions might be used to publish a particular targeted ad on a Web page. It might be used to determine whether a person who’s trying to apply for a credit card is the same person you just told no to a few minutes ago.
When you start weaving a lot of different data together, you end up with a lot of knowledge—properly correlated information about people, your products and your supply chain—and you want to make sure, for example, that it’s not inadvertently released. With that in mind, one of the points of G2 is not only to help organizations make more sense of what they know, but also to better protect what they know. This is especially important when you’re talking about what you know about your customers and their personally identifiable information. G2’s purpose is to be smarter and more responsible. That invokes the notion of privacy by design, which says you can bake in privacy-enhancing features, instead of building systems and thinking about privacy only after the fact.
Q. Does G2 have privacy-enhancing features that can’t be turned off?
A. Yes, there are some privacy features that cannot be turned off, like knowing where every piece of your data comes from, or what we call “full attribution.” That’s an important privacy feature because if somebody is taken off a watch list, you better be able to find it and remove it downstream. G2 also favors false negatives in a way that cannot be turned off. Favoring false negatives means G2 only asserts that things are the same and related when there’s overwhelming evidence—in other words it’s not overly optimistic. You can’t unbake that.
Q. You mentioned organizations having observation spaces. Could you explain that in more detail?
A. A bank, for example, has an observation space including its customers and, say, its Twitter account, where customers are saying things to or about them. A bank also has a watch list produced by the Department of Treasury—that’s also in their observation space. Another one is people who apply for jobs or work for the bank. G2’s function is to make sense of your observation space. If you’re not making sense of your observation space, you can miss the obvious, and this can be wasteful.
We found a retailer where two out of every 1,000 people they were hiring had been arrested for stealing from that very same store. That would be an example of an organization not being responsible for what it knows. They have somebody who’s been arrested for stealing from them in one pile, and two doors down, that same person is in another pile of data containing applicants applying for jobs. That’s embarrassing.