Example with Sample Data
Cooperation can evolve when there are opportunities to work together for mutual advantage without a high risk of betrayal. The risk of betrayal can never be zero so basically we can expect people to cooperate when they can bound their risks and reasonably expect to gain from cooperation.
For that reason, the Pygar process is designed to allow cooperation over small bits of information while concealing the rest. Some people have trouble visualizing how that is possible, so we offer this highly simplified example with some fake private data.
The Utility Demonstrated by this Example
This example is motivated by a problem faced by financial institutions that issue consumer credit. Some fraction of their credit cards and loans are issued to individuals impersonating another individual; in other words, a few customers are committing identity theft. It is important to stop identity theft early to minimize losses and legal problems.
In principle, it should be easy to catch identity thieves who typically open new accounts using information about another party. The legitimate party still has a legitimate account at his or her legal residence. The thief will use a different address so that correspondence regarding the illegitimate accounts is directed away from the victim whose name and financial reputation are being appropriated. The situation can be detected if financial institutions compare their customer lists and addresses. However, in practice there are two barriers to such cooperation. First, the institutions are legally bound not to disclose financial information belonging to their customers and sharing with another institution would represent disclosure. Second, the institutions are actually competitors. Each fears the other will try to steal its customers by contacting them via telephone or mail. Consequently, this obvious operation requires a negotiation about the content of concealed data leading ideally to the exchange of a small controlled amount of information regarding only the suspected, illegal, identity-theft cases.
To illustrate the operation of the negotiated information sharing applied to this example, we introduce the simplification that we simply look for individuals with accounts in two different zip codes. The nine digit postal zip code (ZIP) is a very specific geographic locator and we can use an individualÕs social security number (SSN) as the standard identifier for that individual. In more general cases, the broker will need to establish a controlled vocabulary. In this example, however, the format of SSN and ZIP are well known.
Sample Data
In the example there are two institutions A and B with the customer lists shown in Table A and Table B.:
Table A – Credit Card Customers of Bank A
family given zip ssn
------ ------ ------ ------
Hollander Dave 20114-4671 214-14-7879
Kimber Eliot 26420-9681 155-47-9782
Magliery Tom 18603-8576 490-48-8856
Maler Eve 13646-5203 615-32-7706
Maloney Murray 79123-5517 695-68-3428
Murata Makoto 81789-9347 690-95-1189
Nava Joel 48742-5578 902-60-6893
O'Connell Conleth 37863-9169 736-80-7781
Paoli Jean 28667-9364 341-61-9229
Sharpe Peter 15576-2097 655-68-6926
Sperberg-McQueen C. M. 26667-2936 411-38-1750
Tigue John 32122-3390 171-51-9946
Table B – Credit Card Customers of Bank B
family given zip ssn
------ ------ ------ ------
Angerstein Paula 86910-9913 506-75-5891
Bosak Jon 13858-6727 639-55-6905
Bray Tim 59187-3619 623-87-8498
Clark James 87635-2887 528-94-3591
Connolly Dan 28494-9947 234-75-3246
DeRose Steve 77763-2195 435-12-7091
Hollander Dave 20114-4671 214-14-7879
Kimber Eliot 26420-9681 155-47-9782
Magliery Tom 18603-8576 490-48-8856
Maler Eve 13646-5203 615-32-7706
Maloney Murray 38166-1508 695-68-3428
Murata Makoto 81789-9347 690-95-1189
Information to Protect:
Information to Discover in Tables:
Method of Discovery
If there were no concerns about revealing private records or about sharing customer data with a competitor, then the two banks would directly exchange data tables. Then either bank could discover the case of suspected identity theft involving their shared customer Mr. Maloney. One possible method would be to place the tables in a relational database having a query language based on SQL. Then the following SQL query directed to the database will return the result:
select a.family, b.family, a.ssn, a.zip, b.zip from table_a as a
join table_b as b on a.ssn = b.ssn where a.zip <> b.zip
In English the query reads: ÒGive me names, zip codes and SSN for customers present in the both databases and showing the same social security number but different zip codes.
When the Pygar approach to negotiated information sharing is applied to this example, the data are protected by encryption and then sent to a third party broker who cannot decrypt the data but can discover the desired correlation between records. This example will illustrate that the discovery process followed by the broker for encrypted data is exactly the same as the process that works for unencrypted data. The only consequence of encryption is that the broker sees only an encrypted result – consistent with the plan to prevent the broker for learning the secrets in the data sets. This will become clear as we look at the data.
Transmittal of the Data to the Broker
All modern business methods rely on standard methods to package data for exchange over the Internet. The method used here is XML because it is the most widely understood and accepted. The data of Table A and Table B can be converted easily and in many cases automatically to XML. Banks A and B are working independently so they need a common format. The broker provides the format. In this example, the broker specifies a small vocabulary of XML tags: <names>, <name>, <family>, <given, <ssn>, and <zip>. For the vocabulary of nouns, the broker does not provide a vocabulary list but simply specifies that names must be written in Latin-1 alphabet, that social security numbers have their canonical form, and that the zip codes should have the precise 9 digit US Post Office format.
The tables are represented as XML in Listing 1 in columns Ex1A and Ex1B:
Listing 1
Ex1A – Bank AÕs Data |
Ex1B – Bank BÕs Data |
<?xml version="1.0" encoding="ISO-8859-1"?> <names> <name> <family>Hollander</family> <given>Dave</given> <ssn>214-14-7879</ssn> <zip>20114-4671</zip> </name> <name> <family>Kimber</family> <given>Eliot</given> <ssn>155-47-9782</ssn> <zip>26420-9681</zip> </name> <name> <family>Magliery</family> <given>Tom</given> <ssn>490-48-8856</ssn> <zip>18603-8576</zip> </name> <name> <family>Maler</family> <given>Eve</given> <ssn>615-32-7706</ssn> <zip>13646-5203</zip> </name> <name> <family>Maloney</family> <given>Murray</given> <ssn>695-68-3428</ssn> <zip>79123-5517</zip> </name> <name> <family>Murata</family> <given>Makoto</given> <ssn>690-95-1189</ssn> <zip>81789-9347</zip> </name> <name> <family>Nava</family> <given>Joel</given> <ssn>902-60-6893</ssn> <zip>48742-5578</zip> </name> <name> <family>O'Connell</family> <given>Conleth</given> <ssn>736-80-7781</ssn> <zip>37863-9169</zip> </name> <name title="editor"> <family>Paoli</family> <given>Jean</given> <ssn>341-61-9229</ssn> <zip>28667-9364</zip> </name> <name> <family>Sharpe</family> <given>Peter</given> <ssn>655-68-6926</ssn> <zip>15576-2097</zip> </name> <name> <family>Sperberg-McQueen</family> <given>C. M.</given> <ssn>411-38-1750</ssn> <zip>26667-2936</zip> </name> <name> <family>Tigue</family> <given>John</given> <zip>95819</zip> <ssn>171-51-9946</ssn> <zip>32122-3390</zip> </name> </names> |
<?xml version="1.0" encoding="ISO-8859-1"?> <names> <name> <family>Angerstein</family> <given>Paula</given> <ssn>506-75-5891</ssn> <zip>86910-9913</zip> </name> <name title="chair"> <family>Bosak</family> <given>Jon</given> <ssn>639-55-6905</ssn> <zip>13858-6727</zip> </name> <name> <family>Bray</family> <given>Tim</given> <ssn>623-87-8498</ssn> <zip>59187-3619</zip> </name> <name> <family>Clark</family> <given>James</given> <ssn>528-94-3591</ssn> <zip>87635-2887</zip> </name> <name> <family>Connolly</family> <given>Dan</given> <ssn>234-75-3246</ssn> <zip>28494-9947</zip> </name> <name> <family>DeRose</family> <given>Steve</given> <ssn>435-12-7091</ssn> <zip>77763-2195</zip> </name> <name> <family>Hollander</family> <given>Dave</given> <ssn>214-14-7879</ssn> <zip>20114-4671</zip> </name> <name> <family>Kimber</family> <given>Eliot</given> <ssn>155-47-9782</ssn> <zip>26420-9681</zip> </name> <name> <family>Magliery</family> <given>Tom</given> <ssn>490-48-8856</ssn> <zip>18603-8576</zip> </name> <name> <family>Maler</family> <given>Eve</given> <ssn>615-32-7706</ssn> <zip>13646-5203</zip> </name> <name> <family>Maloney</family> <given>Murray</given> <ssn>695-68-3428</ssn> <zip>38166-1508</zip> </name> <name> <family>Murata</family> <given>Makoto</given> <ssn>690-95-1189</ssn> <zip>81789-9347</zip> </name> </names> |
At this point, the information could be sent to the broker, but then the broker could read it and compromise the data which is private to the customers and proprietary to the bank. Consequently, Bank A and Bank B both encrypt the data before sending. The broker should select one of the banks to originate a one-way encryption key. It does not matter whether Bank A or Bank B selects the key so long as the broker does not learn what the key is. For the example, Bank B selects the key "8jq6r3llnwc9ytraz56d9" and conveys it to Bank A by a channel protected by public key encryption methods. The banks also agree on the one-way encryption algorithm. For this example, they select the algorithm specified in Internet RFC 2104 entitled ÒHMAC: Keyed-Hashing for Message AuthenticationÓ. (For reference see:
http://www.faqs.org/rfcs/rfc2104.html ).
The encryption key and algorithm are applied to each tagged text field in the document but not to the tags themselves. After encryption we have encrypted documents suitable for transmitting to the broker. These are shown in Listing 2.
Listing 2
Ex1A – Bank AÕs Data |
Ex1B – Bank BÕs Data |
<?xml version="1.0" encoding="ISO-8859-1"?> <names> <name> <family> 13fcbb5a294d1757e6fe7a11f6350e2a</family> <given> 469b18ad97fe2ad2697762cf75e9b4ef</given> <ssn> 1affc1f262bb2b5b6d68ac82c0a95b08</ssn> <zip> 5d37efafd6aa96bb73439281326e9365</zip> </name> <name> <family> 9f411be3b3287c0340f819d33988a581</family> <given> f92dd3cca133735afa9250a90d9d9ce8</given> <ssn> ed91abc48d12aa0c9fa076395739ae8f</ssn> <zip> ce787cd12a872126775284bb36f9b8b6</zip> </name> <name> <family> 54562948672b147f43ecd34b38e54465</family> <given> 0dfd0f4b0eb887437ccde15551b0ca08</given> <ssn> becbf5a838063fa98fde59837816f7fd</ssn> <zip> 4fa1a882cc88b8cf678b2400dcc5b50d</zip> </name> <name> <family> 7aafe6c6f88ff18a475904426428b50c</family> <given> 965499084bf3c51fdd1d3fc042970470</given> <ssn> 8156055273ad246327b95e40a625ebe6</ssn> <zip> fc230e26e6ba3101f335dd7b1e34a0d1</zip> </name> <name> <family> d4ab11188fa3f100f1930c8ae07c0916</family> <given> 97ddccf8c83fad09ee6368b37097add0</given> <ssn> f1e715fedb1cdebd102257dc77a758ca</ssn> <zip> a30225c1fb0d0bcc645b07173e9f5853</zip> </name> <name> <family> 5dbc9a176f6006463b5ac15f59100339</family> <given> 363400fb7d263b1901e4b6987f0982f3</given> <ssn> bfeeac009d9a7fdbaf9f5ce5a3e33552</ssn> <zip> f7c9dea00b7506de5c562e56cf133df5</zip> </name> <name> <family> 613804ef73c5d277fa3689c059ead503</family> <given> 754cd803ad6f4017076c4a7678d57597</given> <ssn> c970439ac2180f84bf01c55416240065</ssn> <zip> 50d825269a44e3dbd37480e47e8254dd</zip> </name> <name> <family> ce98f7f3d613b44f9c3ce49a31bbd6bc</family> <given> 29e111228d8cd6fadd74283a4ff1ccf1</given> <ssn> 22fffcd79a7ba77e8cd0a87fbe5e1fdf</ssn> <zip> f3574a1dd8d2529c3309b7a4a411c0c7</zip> </name> <name> <family> 3a87975e3046cf6b2dd677e4da00411b</family> <given> b8782e7b88ce8856abcd3b596d26f9a0</given> <ssn> 37999a93c046cd4b35f1b1b5f1247231</ssn> <zip> e6ad57dfc428a7b2e76dd00eed0ece87</zip> </name> <name> <family> 4f05625d5e8086b0f19094b6733f1e97</family> <given> 37c4e95510cd1ecaa4c511c238565f43</given> <ssn> 25d8c25b228670cfdc802f6de85aaaf0</ssn> <zip> 691b5b3dd65c6eddaff675aac64b62a2</zip> </name> <name> <family> 73772b88356aeb76672ea70dbe68dac7</family> <given> f9f84f73d6500d61d9533066234c7c19</given> <ssn> f9bd06367543e1ef38ee97a8e67b0072</ssn> <zip> 2992a9a939e0082b07bd929b53144899</zip> </name> <name> <family> 08b36d051077ceb073f310c0910d62aa</family> <given> 9c1c0116d3626b4849ea76f783edb7c4</given> <zip> 90ebb7839021e3a42d250053c6ad630d</zip> <ssn> 3a32e6f48ed53a07c1f0dd1042d2352d</ssn> <zip> 8e4559ecab39a13c87fa1c7ecc7e6379</zip> </name> </names> |
<?xml version="1.0" encoding="ISO-8859-1"?> <names> <name> <family> 86fafbd64435148a93051d3700826443</family> <given> 82eff871b8a1d21c5c239f2ca50105d4</given> <ssn> e6d8b23d2ea992079b5df001e8c44208</ssn> <zip> 89f9209a83d0befaff1b5f4f9a2b21d5</zip> </name> <name> <family> 08debbaacc14ff2066c0c266c8e101ed</family> <given> 61e92c708c08511fc2acb2d35913e75c</given> <ssn> 59a2b0e68abcbcec74f297e6d6313e90</ssn> <zip> 39dbf7f690a92b394237c12ff4fa7d1e</zip> </name> <name> <family> 0ecdfce4b952a5bb5b76af13eb2f0724</family> <given> 8ad80dbfa7204c4aaa948a37c4a743ff</given> <ssn> d375edbd00738b0e7f834df6fcbf58d6</ssn> <zip> d6ba0fd31b5d13440fc385b21a6e1076</zip> </name> <name> <family> 4c81f02c7c958974220ae21768f0e5a7</family> <given> c57b7a774560f5210eb4365b816b2130</given> <ssn> 52cf646279c460a419914789b93a301c</ssn> <zip> e619a9397372bf6419e0d4df33b78ec7</zip> </name> <name> <family> 9f2b853151132565b1ecf400bd686042</family> <given> e0b3159739ac675940a1fbeae32beede</given> <ssn> bdd4dd31890a9d5cc32ca3004d11e269</ssn> <zip> d980fcc718f88c2c41e8323f78ada785</zip> </name> <name> <family> e61473fb76983a98bfa9b3336956d26e</family> <given> fe0779f5354ceaff47f1914a8e5e5146</given> <ssn> c24d4be26b8398bbf7dde81ba65f3292</ssn> <zip> 030908dd07e45bfe2f7e6645417ba976</zip> </name> <name> <family> 13fcbb5a294d1757e6fe7a11f6350e2a</family> <given> 469b18ad97fe2ad2697762cf75e9b4ef</given> <ssn> 1affc1f262bb2b5b6d68ac82c0a95b08</ssn> <zip> 5d37efafd6aa96bb73439281326e9365</zip> </name> <name> <family> 9f411be3b3287c0340f819d33988a581</family> <given> f92dd3cca133735afa9250a90d9d9ce8</given> <ssn> ed91abc48d12aa0c9fa076395739ae8f</ssn> <zip> ce787cd12a872126775284bb36f9b8b6</zip> </name> <name> <family> 54562948672b147f43ecd34b38e54465</family> <given> 0dfd0f4b0eb887437ccde15551b0ca08</given> <ssn> becbf5a838063fa98fde59837816f7fd</ssn> <zip> 4fa1a882cc88b8cf678b2400dcc5b50d</zip> </name> <name> <family> 7aafe6c6f88ff18a475904426428b50c</family> <given> 965499084bf3c51fdd1d3fc042970470</given> <ssn> 8156055273ad246327b95e40a625ebe6</ssn> <zip> fc230e26e6ba3101f335dd7b1e34a0d1</zip> </name> <name> <family> d4ab11188fa3f100f1930c8ae07c0916</family> <given> 97ddccf8c83fad09ee6368b37097add0</given> <ssn> f1e715fedb1cdebd102257dc77a758ca</ssn> <zip> 7ff0c220b90ac551d6d4969ab68f0229</zip> </name> <name> <family> 5dbc9a176f6006463b5ac15f59100339</family> <given> 363400fb7d263b1901e4b6987f0982f3</given> <ssn> bfeeac009d9a7fdbaf9f5ce5a3e33552</ssn> <zip> f7c9dea00b7506de5c562e56cf133df5</zip> </name> </names> |
BrokerÕs Discovery of the Basis for Negotiation
The broker receives the pair of encrypted documents shown in Listing 2. From these documents, the broker will discover pairs of XML statements about individuals such that one individual of the pair is on the list provided by Bank A and the other statement is on the list of Bank B. There are many technical implementations of this discovery process but for the example we elect simply to convert the XML documents back to SQL database tables. This conversion is readily accomplished with off-the-shelf software available from Oracle, Microsoft, and others.
Once the data are in tables, the broker simply issues the SQL query shown above to the relational database. The result is a reply that informs the broker that an individual whose last name encrypts to the value Òd4ab11188fa3f100f1930c8ae07c0916Ó and whose social security number encrypts to the value Òf1e715fedb1cdebd102257dc77a758caÓ is found on the lists from both banks but the zip-code of the billing address provided by the two banks differ and encrypts to ÒÓa30225c1fb0d0bcc645b07173e9f5853Ó in the list from Bank A and to Ò7ff0c220b90ac551d6d4969ab68f0229Ó in the list from Bank B.
The broker now knows that Bank A and Bank B have a matter to discuss that potential involves criminal behavior. The broker notifies the banks about this mutual, shared interest in customer with encrypted social security number Òf1e715fedb1cdebd102257dc77a758caÓ. This completes the brokers role in the negotiation. From that point forward, the two banks deal directly with each other.
Useful Results of the Negotiation
The brokered negotiation over encrypted documents has lead to a useful start of an investigation. However, we can also point out that the privacy of all other individuals who are not connected with the investigation has been strictly preserved. Moreover the competitive positions of banks vis-ˆ-vis each other in marketing themselves to customers has been preserved throughout the negotiation because neither bank could inspect the others customer list.
Other advantages accrue in different situations. For example, when we consider national security issues, it is important to remember that investigative units compete with each other for the rewards of a successful investigation. While each unit may have an organizational directive to capture criminals and terrorists, a unit is only rewarded in the present system if the team makes an arrest. Thus, there is a known reluctance on the part of these units to share information pertaining to on-going investigations. When a broker is involved in negotiations, however, the broker can keep a record of the sharing that takes place. The broker cannot read the encrypted basis of negotiation, but either party in the negotiation can use the broker as a reference to establish that it provided essential information to the other party. Thus, a potential benefit of this negotiation process is improved cooperation due to proper assignment of credit for success.