HashD - Draft 0

HashD: A Peer-to-Peer Bottom Up PKI and Identity Cryptosystem

Harrison Stahl - @NotEnoughEntrop

info@hashd.in

abstract: A purely peer-to-peer digital trust system which allows for the creation of a bottom up[1] distributed public key infrastructure, identity, and pseudonymity. The network is made up of privately signed PoW blockchains which use existing trust relationships to bootstrap a verifiable network of trust. Trusted parties exchange signed trust messages and sign both theirs and the other parties trust messages into their blockchains. Some block hashes are checkpointed to the Bitcoin blockchain using it as a verified timestamp server as well as a globally available immutable ledger. This facilitates provable identity histories and a method for certificate revocation/key updates. Proof of work included in identity chains adds a verifiable cost function to identity creation. This coupled with trust declarations that must come from other identities, dishonest identities not only affect themselves but other identities that attested to the honesty of that identity. This adds cost for a determined attacker creating a trust network amongst identities they control to boost their trust. As more identities distrust an identity all of the identities that declared trust in that identity will lose trust, eventually making the entire network of the attacker's identities to become distrusted by the rest of the network. Distrusted identities would be worthless thus wasting all of the proof of work put into the attacker's dishonest network.

disclaimer: This is an early draft without all of the references, images, and could use some simplification. The next draft will be written after implmenting an alpha client WIP(5/1/17).

1. Introduction

Identity on the Internet relies mostly on large institutions. While in theory anyone can setup a domain, email server, and certificate authority and have more control over their identity. E-mail and simple certificate authority falls short on many of the desired features of identity. Another solution is tying it to an account with social media networks. While this works well for most people they are centralized institutions which bring with it the weaknesses of any large centralized institution.

There are many features of identity that are desirable and missing from current solutions. One missing feature is the "exit" option from many online identity solutions without losing your history and reputation. In the real world, exit is possible but the friction to do so makes it impracticable for a large portion of the population.

Another problem with many of the existing identity solutions is their ability to prevent many accounts being created on their platform. In the physical world this is not as much of problem on the national level, however, between nation-states, there is no way without expensive intelligence gathering for a state to verify the validity of an identity issued by another state. There is also the issue of the increasing power of machine intelligences that will reduce the effectiveness of anti-spam measures such as captchas. There will need to be new ways of rate limiting that are not dependent on human intelligence tests.

This lack of verification trickles down to other secondary applications that reuse these identities. The low cost of creation means secondary applications have to engineer ways to mitigate cost from abusive users. The lack of feedback and history to identity doesn't allow for a more nuanced view of an entity's trustworthiness.

There are no universal systems in which an identity has the feature of portability. This is due to the lack of universally trusted institutions. Many of the large social networks' business models prohibit the possibility of exit. Identity enables the establishment of trust, this precludes the possibility of a universal centralized identity system. Nation-states have political agreements for immigration and emigration, and visas that work amongst trusted states. However, these agreements cover huge populations, blocking good people and letting in bad sometimes since the finest granularity of control is at a national level. With mass migration and international terrorism, there needs to be a more fine-grained approach to identity as our traditional systems have repeatedly failed us.

What is needed is a digital system tailored for identity/pseudonymity based on cryptographically provable history and explicit trust declarations to allow globally interoperable trust networks. Explicit trust declarations embedded on privately signed PoW blockchains cross-signed with other privately signed PoW blockchains can allow for larger data structures to be verified granted involved parties' trust network connects at some point. The veracity/trustworthiness of these data structures can be calculated on the graph of trust declarations coupled with the combined weight of their cross-signed chains. While you don't need to trust the entity directly there must be enough trust in the trust graph between you that there is reasonable certainty you can trust that identity. In this paper, I propose how such a system can function and how it could be used to address many of the problems existing identity systems have.

2. Identities

An identity is defined as a PoW blockchain with each block being signed by that identity's private key. The block header consists of the previous block header's hash, the merkle root of the block's data, a signature of the previous block header's hash and merkle root, and a nonce used in the proof of work.

Block header format

The use of the signature allows for two things, verification that all the data in the block is vouched for by that identity. The other is protection from outside miners attacking identity chains.

Identities are generated by creating two public key pairs the first one is used to sign the header while the other is stored offline to protect against identity theft. This second key supersedes the first. If an attacker gains access to the first key and starts impersonating that identity. The real identity holder can pick any block that was signed with the first key and build off of that using the second key to sign that block. This invalidates any blocks built off of the chosen block not signed with the second key. The first block using a backup key must be an ID block that declares the next backup key. This cross-signing on to the public Bitcoin blockchain creates a globally available and secure way of determining the validity of public keys.

ID creation and update

Since the PoW of one identity could easily be overshadowed by a determined attacker with a number of the previously exposed keys it could allow an attacker to rewrite larger portions of an identity. By publishing key updates and occasionally checkpointing to the bitcoin blockchain individual chains can now leverage the massive chain weight of Bitcoin to ensure the immutability of their chain even if the attacker has a number of your previous keys. Recovery from both the live key and the secondary being compromised is explained in a later section.

Identity chains only need to share block header information and key updates, this allows for an arbitrary amount data to be stored in an order verified manner on an identity chain is unlimited. Since you don't have to reveal any of the contents of block except in ID blocks it would be possible to deny access until you wish to prove the existence of a piece of information on your chain. There are various encryption methods that could be used to control access however encrypted data storage is not the focus of this paper and we'll just note merkle based file systems such as IPFS and Maidsafe are tackling this problem already.

3. Proof of Work

Cryptosystems like Bitcoin[?] have shown the value of proof of work where no initial trust is assumed. The function of proof of work is that of an infallible energy consumption sensor. It is in a sense a trust processor, it's only useful output is rare information. The rarity of a certain bit of information is guaranteed by the laws of mathematics and physics, assuming the hashing algorithm is sound.

By reusing the Double SHA256 PoW of Bitcoin we can have a relatively simple way to estimate the cost per hash at any block height by looking at difficulty, average block reward, and bitcoin price. This cost metric allows for untrusted pseudonyms to be generated for online services, some services might require a certain threshold of the value of PoW in an identity chain. This adds a cost to bad behavior because it increases the cost of creating replacement identities when they are banned from the service.

With email authentication, an email banned from one service could be reused or resold for use with another service. This lack of reputation further decreases the effective cost of email addresses. The next section will describe a mechanism to allow for this feedback.

4. Trust Declarations

Trust declarations look like any other piece of data added to an identity's blockchain. It is just a specific data type follows a protocol for use and specific meaning. The block hash, hash of the trust declaration, and the other hashes required to verify the trust declaration is shared to the party trust is being declared in. The other party then can add that information to its own blockchain for other identities to see and use to verify.

This signing and incorporation into both blockchains establishes a relationship so that if one party is wronged and they attest to that fact with a negative trust rating of that party. This fact now has some weight because it can be verified that they had some voluntary relationship prior to the negative rating.

The accuser then can reveal any relevant data on their private blockchain to give evidence to their claims. This can be corroborated by other identities in their trust network vouching for the validity of the attested to facts in the accuser's blockchain through the same cross-signing mechanism used for trust.

The "truth" is dependent on the identities place in the trust network graph. Depending on the networks of trust one is a member of, the trustworthiness of another identity's fact can be judged to be true or false. A parallel is to think of war crime trials, depending on who you asked before the war some things were okay and others not. However, after the war, the winner imposes their perspective of right and wrong. In the proposed system there is no winner, trust networks just become disjoint on boundaries of disagreement.

This system essentially creates a consensus reality built upon a verifiable history of actors, your trust relationships determine which reality you believe.

5. Trust Sub-types

The prospect of disjoint trust networks would mean smaller networks reducing the utility value of those networks. One solution is to create context-specific sub-types of trust. To illustrate what is meant by "context specific", just think of the different people in your life and think of things you'd trust or distrust them with. Some you might trust to tell you about a good restaurant but not much beyond that. Others you might trust them to pay you back but maybe not with a secret. Since there is such a wide diversity in the contexts of trust sub-types would not be rigidly defined. The common usage of sub-types would define their meaning. Those not using the proper meaning would become distrusted by those who adopt a different meaning, evolving usage of sub-types much the same way languages evolve.

The dynamics of trust networks change depending on the context of that network different algorithms of trust could be implemented to reflect the different trust models. I'll outline some of the features that are possible. Many of link state protocols designed for physical networks could be adapted for use in trust networks as it is a similar problem space.

For example, you might extend a retail trust type to an online retailer and they extend a customer trust type to you in reciprocity. As a customer, your client could only count retail trust ratings from identities which have the trust type of human and have a good consumer rating. Which human ratings you trust could be a matter of putting a hop count limit in your network of human trust and require that each human trust declaration was by a human identity that doesn't create a cycle in the human trust graph. Another way is to verify the humanity(or any other trust type) is to calculate the chain weight of the other alleged human identities which attest to the humanness of the identity in question. Given a high enough chain weight(cost) it would be safe to assume those identities wouldn't risk becoming distrusted and thus destroying the value sunk into adding weight to the identity.

Trust data types can be scalar values. They can be thought of as signal to noise ratios 1 means it is a lossless link and trust them as much as yourself. 0 would mean no trust, -1 distrust. The degree of trust between any two identities would be the combined trust signal of all trust paths linking them. The trust "signal" leaves your identity as 1 and is multiplied by the trust of each link in the path. If a "signal" drops below a certain value the path is terminated. Multiple trust paths can be included in the trust calculation, paths that are more exclusive to each other offer a higher degree of certainty since the total value of the identities on the path is higher.

6. Identity Recovery

If a very important and valuable identity has both its live and backup key compromised. The attacker is likely to invalidate the current key with the backup key, then issue another new key and cycle out the backup key so the original owner can no longer sign valid blocks on their own chain.

One solution is to rely on the network of trust in your social network. Based on your established trust declarations and cross declarations. If there are two chains claiming to be valid the tie can be resolved by picking the chain which has been signed onto the most identities previously declared as trusted by the identity in question. The trust sub-type could be "identity" signifying that identities with that declaration are trusted to pick the proper identity. These declarations could be hidden until needed since the data is immutable. Both the attacker and victim identities are built from the same chain, the network would be able to see where the chains split. The victim's chain will still receive queries from the network. The victim can publish the trust declarations from before the split and their verifying hashes. This allows the confused identities to resolve the contention by verifying that the victim's chain has signatures from the earlier trust declarations attesting to the chain's validity.

7. Network

Using an identity doesn't require cross-signing with other identities the minimum requirement is creating an ID block and cross-signing with the Bitcoin blockchain. These identities could be used for online services that don't need great trust assurances which are just looking for the anti-spam aspect of identity. This allows for low friction creation of identities not linked to a particular trust network. If trust is not needed the minimum clients are required to do is transmit their block headers and Bitcoin checkpoint transaction IDs so proof of work cost can be calculated and verified by the requester without interaction with other clients.

However to leverage the trust part of this identity network requires the following:

  1. When there is a new trust declaration this is transmitted to the identity that the declaration is relevant to, this may be propagated to other identities in the trust network.
  2. An Identity that receives a trust declaration includes the declarer's block header, declaration, and hashes required to verify block.
  3. Identities transmit requested trust data given the requester has the trust required by the client.
  4. When trying to ascertain how trusted another identity is, an identity floods its trust network with an ASK packet asking if anyone trusts an identity if no link is found it is flooded out from each identity that was asked to their set of identities until a link is made.
  5. Once a link is made a PROOF message is created and sent to the previous link, each identity verifying the previous hop and sending it to the originator, then sending its PROOF message to the next link to be validated. This is repeated until the originator is reached.
  6. Verification information sent in the PROOF message depends on the originator's request, this information includes part or whole block header chains for identities, trust declarations, and hashes required to verify trust declarations.

ASK packet example

requester_sig: 1385hfansui23bnufa,
requester_message:{
  trust:{
    human:1
  },
  hop_count: 4,
  originator_signature: jklasdnf83289QWetret,
  originator_message:{
    originator_pubkey: 7f7899374htiwun,
    destination_pubkey: njkaghi23089htnas,
    latest_bitcoin_block_hash: ...000001e950...,
    verification_options:{
      trust:{
        retail:{
          retail: .85,
          genesis_block_height: 400000,
          max_hop:1,
          PATH:{
            trust:{
              consumer:{
                trusted_by:{
                  type: retail
                  count: 3
                  consumer: .85
                  max_hop: 1
                },
                human:{
                  human:.95,
                  genesis_btc_block_height: 400000,
                  max_hop:35,
                  OR:{
                    history_since_btc_block: 423366,
                    id_chain_weight_btc_bits: 10000
                  }
                }...

In this example pseudo packet, the originator is trying to establish trust with the destination retailer. The verification options defined recursively allow clients to create a trusted chain across subtypes. The originator defines trusted human as one with a human trust signal of .95, not being more distant on the human trust graph than 35 hops, and identities in the chain must be at least roughly 6 months old(current block height 425366). The other requirement is showing at least a history of about two weeks or a chain weight cost of at least 5 USD at current Bitcoin prices. Consumer is then defined by the human type and having at least 3 direct trust declarations of consumer type averaging at least .85 from identities with a retail trust type. While this may seem as circular since a retail type is within the definition of retail type. However, since the consumer trust declarations by retailers include enough information other ASK queries can be set to those identities suspected of not being retailers. Message signature could then be used to incriminate dishonest identities while also allowing people to verify that dishonesty from their perspective in the graph. Going up one more level to the retail context it is defined by having a retail trust of .85 that comes from a trust path defined in the PATH object. The other requirements are that the identity was created before block 400000 and cannot be further than 1 hop away from the termination the trust path defined in the PATH object. PATH objects can be recursively nested to create complex trust proofs. Anything listed in a trust definition is considered a logical AND, any other logic must be put inside of a logic object such as OR.

The outermost layer of this packet is made up of the trust signal strength, hop count, and requester's signature to verify the contents of the packet. The trust signal strength and hop count are updated at each hop. Each client multiplies the trust signal strength sent from the last identity by the trust rating they have with the requester, if it is 1 then it is unchanged. If it falls below a threshold set by the originator in the verification options, the client will drop instead of forwarding it. If the client knows the destination to be untrustworthy it can send a PROOF message containing evidence to the originator through the trust chain that has been built so far, trust being verified at each hop. The ASK packet is stored in the client so that if a PROOF message needs to make it back to the originator each client can remember where to route the packet next to continue building the chain of trust. The latest Bitcoin block hash is used as an easy way to expire old packets out of the buffer if they are not on the chain of trust between the two identities. This also mitigates replay attacks that could make it look like an identity is spamming the network with requests.

From the perspective of the client receiving the ASK packet all it knows is what neighbor in the trust graph the request comes from, verification options, who is asking, and who is being asked for. This reveals minimal information about the path taken, but still allows for some routing. For example by storing an ASK message's originator public key, requester, and trust signal strength you can get a vector telling you what direction the trust is coming from and how strongly you trust them. With that, you could avoid flooding requests since you already know the trust signal loss and which identity to forward it to. Another way to build some sort of routing table is to store when a PROOF message comes back through. At the very least you know the direction that the PROOF message came from. Since you also know what its verification options were and the trust signal strength when the ASK passed through client you can at least have a range of trust signal loss between you and the destination identity.

To get even better efficiency without relying on special router nodes you could use another trust type like "graph" which you could specify a number representing how many hops out you'd like to share that data. Your neighbors will always know they are connected to you but you could get much more efficient routing by having data from neighbors of neighbors or even more degrees out. The more people that reciprocally share graph data the more likely someone has useful routing data for any particular identity. However, the more identities share the less private the graph of their connections is. While the whole chain of trust has to be revealed to the originator when trying to confirm a particular identity. The destination identity can choose which ASK packets they respond to allowing the destination identity to choose which route(s) it send the PROOF messages back through, which would allow the hiding of some relationships. This is assuming there are multiple paths an ASK message took to get there since PROOF messages can only be routed back through an ASK forwarding chain.

PROOF packet format

{ 
  pubkey:njkaghi23089htnas,
    hop_num:7,
    sig: ajgdshlu4ethu23894th,
    verification:{
      block_chain_data:h28tnfia....j23r9jf,
      btc_checkpoint_txid: 3e51af44f7ca31624c334ea69b4ab2e88444b96b64c57d1ae4c02ebeb22f73c3,
      trust_declarations:{
        retail:{
          trust: 1,
          truster_block_hash:00005u8293fjnalui43q2FEDaw3afawtTas,
          verifying_hashes:[iafnjhq3245Qwtadg..., ..., ...adfjgi43t1325asgea]
        } 
      }...

When an ASK packet reaches the identity asked for that identity starts creating a PROOF packet. The packet looks through its blockchain and gathers the information required to satisfy the verification options. This would consist of the block headers of the identity, bitcoin checkpoint transaction id, hashes required to verify trust declaration's validity in the identity's chain and anything else required for the originator to verify declarations. This is then signed by the destination identity and sent to the requester along with the original ASK data which is used by the requester to route the PROOF message back up the path of trust. To save bandwidth each requester sends the previous link's statements once verified against their chain to the originator. Then that link forwards its PROOF to the next link, which does the same forwarding the previous link's PROOF after it has validated it against itself, generating a PROOF message for its neighbor to verify. This process is repeated until the originator is reached while decrementing the hop number each time. This allows originator an easy way for assembling the chain of trust since messages may arrive out of order. The most distant identity's PROOF messages are the first to be sent to the originator.

8. Dishonest Identities

Having the ability to recover an identity if it gets taken over in the cryptographic sense prohibits a mathematically provable way of determining the true identity. There are now two or more chains that are cryptographically valid.

The due to the nature of blockchains it is evident at which block the attack or deception starts. We say attack or deception since from an outside observer's objective view the two are indistinguishable. When the attacker rotates keys out they have to be checkpointed to the Bitcoin blockchain. Since each key update links to the previous key update Bitcoin transaction, it forms a linked chain. Given one ID OP_Return transaction id and access to the full bitcoin blockchain, you can find or compute all links and branches in the chain. The inclusion of ID block's hash allows you to verify that it was generated using a valid signature, granted you could get the required information about that block to verify it. Without verification of the signature, anyone could make it look like an identity branched given they pay the transaction fees on the Bitcoin network for the bogus ID update messages. By using the ID block hash the client could search its trust network for any identities that have seen that block hash and request verifying information.

Say a dishonest identity does something to abuse its trust. It rotates its keys and signs a block at a lower height than the transgressing block, invalidating the end of its chain in an attempting to hide its duplicity. The abused party can then send messages out to its trust network expressing distrust and publish information that can prove that the key rotation took place after the transgression and that the signatures are valid. This strategy could be injurious to people honestly being attacked, however since all of the identity's history is linked, it is trivial to find a pattern in their history. Furthermore, whether an identity is dishonest or attacked, is irrelevant to this system's goal which is to reduce risk.

The trust network to which the evidence of dishonesty is submitted could be a broadcast to all trust context networks. Clients receiving it can choose to accept, deny, or rebroadcast it, the choice made by the client depends on trust relationships and what the user set for rules of acceptance. Since false distrust can be used to effectively censor identities, you have a "who watches the watchmen" problem. How do you know who to trust to tell you who to distrust? As mentioned in section 4 one way to weed out a denial of service/censorship is to only count distrust from identities which have had a verifiable trust relationship previously. However, each party could then distrust each other making it a matter of which identity is more trusted from your perspective in the graph.

If we use a negative value to represent distrust this makes it impossible to think of it in terms of signal propagation. Instead, clients can implement a decay function, distrusting identities that continue to trust a distrusted identity. This allows other identities to put pressure on those in their trust network, to be honest. If they stand behind their friend they will distrust the identity distrusting their friend. To a third party, both networks become less trusted therefore it is in both networks best interest to avoid distrusting each other because anyone outside of the feud will trust all participants less.

Another option is that if an identities reputation is tarnished maliciously a trusted judge could be hired to look at the evidence and make a determination. Identities trusting these judges now have a way to resolve the contention. This solution would also make trusting dishonest identities costly since now you are distrusted by a widely trusted identity.

Diversity of neighbors increases the resilience of your trust network, the farther apart(on the graph) trusted identities are the more variety in the paths to an identity. This allows you to reach more people in fewer hops giving you a more complete view which increases your likelihood of hearing about dishonesty.

After cryptographic proof and trust/evidence fails the next heuristic to go off of is proof of work, however, determined attackers would be able to at least for a short amount of time out hash the true identity holder.

The proof of work aspect of identities adds another layer of protection against identity theft for pseudonyms. If the data on the pseudonym's identity is encrypted the attacker has little to gain by attacking. If the attacker was trying to impersonate the pseudonym after cycling the keys and the pseudonym didn't have any identity trust set up to maintain anonymity clients could rely on chain weight to pick which identity to trust. The logic behind this is that identities are more valuable to the rightful owner than an attacker and therefore would be willing to outspend the attacker. If the attacker gives up, that chain falls behind and all of the attacker's proof of work is wasted. A determined attacker that could outspend the honest identity indefinitely trying to take advantage of the misplaced trust they'd soon become distrusted and people would choose the lighter chain due to better trust. This method takes time and requires dishonesty and therefore damage to those trusting the attacker's branch. However assuming the victim can retain control of its branch anyone can see that it hasn't branched since and has better trust. The branch that becomes distrusted would waste all of the PoW added to it, victims set the cost to attack which an attacker must exceed.

Conclusion

We have proposed a mechanism for building an interlinked network of identities which can exchange and verify trust without having to trust any particular identity. Starting with a simple privately signed proof of work blockchain doesn't allow for many of the features desirable for an identity system. To add a history that cannot be cheaply rewritten or censored we proposed a cross-signing mechanism with the Bitcoin blockchain. We leverage this to provide a secure and verifiable way create a chain of public keys that allows keys to rotated and replaced. Given these primitives, we explored the possibilities of trust as a data type. We also showed how incentives can be structured to reward honesty. Society is a consensus reality created by the institutions that comprise it, having a decentralized trusted history can allow for the creation of a consensus reality constructed from the bottom up which do not depend on centralized institutions for the 'truth'.

References

  1. Resilience Despite Malicious Participants
    Defcon 24
    https://youtu.be/obqPixhxRmw?t=5m45s
    Radia Perlman

  2. Bitcoin: A Peer-to-Peer Electronic Cash System
    https://bitcoin.org/bitcoin.pdf
    Satoshi Nakamoto