[cabfpub] Revocation ballot

Sat Jul 15 20:57:07 MST 2017

While perhaps not true in all cases, I think key compromise situations should be handled similar to all other software security vulnerabilities. Because they are vulnerabilities, we should react like other reported issues.  We should learn from the work and lessons of other responsible disclosure practices (Project Zero is a great example). I doubt you're in the camp that claims all vulnerabilities should be promptly disclosed without a correction period, so I think we can find some common ground on how certificate vulnerability and key compromise reporting should happen. The remediation window is a vital part of responsible disclosure; it's not merely a nice gesture, but represents the opportunity to work with information security and software development colleagues to remediate vulnerabilities in a way that minimally impacts users (i.e. attempts to reduce the potential harm that could be caused by the vulnerability to the greatest extent possible, while balancing the time constraints caused by an awareness that if one responsible party discovered the issue, it's only a matter of time before another, potentially less responsible, party does likewise). 

Of course, waiting on disclosure or revocation doesn't remove the vulnerability. However, using responsible disclosure does give developers and IT teams time to properly assess the scope of the issue, update their systems, and deploy fixes before the knowledge is wide-spread. I agree with prompt revocation, but what constitutes "timely and promptly" is subjective and depends heavily on context and severity.  For non-emergency situations (a vulnerability or compromise is discovered, but no evidence is found of active exploitation and the discovery occurred in such a way that repeat discovery is unlikely to be imminent), 24 hours seems too short to advise on a restructuring, deploy new certificates, and update systems. For emergency situations (a vulnerability is discovered to be undergoing active exploitation or is publicly available to an extend that re-discovery and exploitation is inevitable), 24 hours is probably too long.  My proposal is we bifurcate the timelines based on the reason for revocation and impact.  A company that changes address and fails to update their certificate within 24 hours is in a different risk category than someone who publishes the key pair used on their shopping site to Github.  I think we're in violent agreement that automation is essential within the industry, and that is a key recommendation we always make. Unfortunately, automation isn't available on every device, nor does every entity deploying digital certificates automate for all situations. Generally, if the device operator is disclosing their private key on a public repository like Github, they likely did not plan on addressing a revocation massively affecting their devices.  

Every company has "perverse" incentives to help their customers.  Google has its own perverse incentives to keep users on its search engine and browser. Somehow, despite these perverse incentives, we all seem to work towards Internet security in our own fashion. Ours is helping entities (customers and non-customers alike) configure PKI related systems, deploy certificates, and remediate messes, e.g. those caused by reusing private keys on hardware devices sent to all of their users.

For example, let's say we have a popular program that has 10 million downloads. Unfortunately, they deployed the same private key in every download (note that this was not our recommendation; at this point, they haven't contacted us for recommendations on best practices for key management and protection, etc.). We receive notice at 1 am on a Saturday. Although we maintain a constantly monitored certificate problem reporting process, the supplier does not. Although we immediately spam every email we have, including their emergency numbers, there's no way they can get someone technical on the phone prior to 1 am Sunday.  The net effect is that on Sunday at 1 am, each user is potentially blocked from their app. That doesn't help relying parties at all.  Instead, if we received responsible disclosure of the issue, but could wait for a week before revoking, the company could push out an emergency update deploying unique certificates to each device (or re-work how their app communicates internally so that local private keys aren't needed at all; yes, that's what we'd recommend in plenty of cases because we actually do care about what’s best for users) eliminating the shut down. Luckily for us, so far, the revocations have not had quite that widespread of detrimental effects.

Seems like there should be a balance we can strike between the need for prompt revocation and the desire not to impact relying parties while still encouraging certificate use.  I proposed two weeks based on the reason for revocation, but I'm certainly open to other suggestions. Maybe it's simply not possible to treat key compromise similarly to other vulnerability disclosures, but I'd like to at least explore the possibility before giving up on it.

-----Original Message-----
From: Ryan Sleevi [mailto:sleevi at google.com] 
Sent: Friday, July 14, 2017 10:12 AM
To: Jeremy Rowley <jeremy.rowley at digicert.com>
Cc: CA/Browser Forum Public Discussion List <public at cabforum.org>
Subject: Re: [cabfpub] Revocation ballot

On Thu, Jul 13, 2017 at 7:13 PM, Jeremy Rowley <jeremy.rowley at digicert.com> wrote:
> Why tell the CA that their Subscriber was compromised, rather than the 
> Subscriber themselves? Alternatively, if the Subscriber _is_ 
> compromised, then it's absolutely the correct incentive for the 
> researcher to report this directly to the CA, so that Relying Parties 
> are not mislead, regardless of what steps the Subscriber steps.
>
> [JR] We often get involved directly with our customers on the 
> certificate management and deployment-side. Thinking of the CA as only 
> the certificate manufacture undervalues the services some CAs provide. 
> Forcing CAs out of the advisory role and solely into the issuance role 
> eliminates a lot of the value provided. Therefore, we would like to 
> get involved early in the process with both the researchers and the Subscribers to advise on all PKI issues.
> Although subscribers have an obligation under the agreement to report 
> certificate issues, we know this doesn't always happen. Take 
> Heartbleed for example. We received minimal prior notification of the 
> event. However, when the event was announced, we released a tool that 
> detected certificate impacted.  With advanced notice, we can assist 
> customers in navigating both industry and internal events.

I'm sure you can understand, however, that having the CA in the advisory role and the issuance role creates a perverse incentive, in which they're aligned to the issuance and non-revocation of certificates. This is because the relationship is with the site operator, not relying parties, and as such, there's a strong incentive to align with the local incentives of the site operators, rather than those of relying parties or the overall ecosystem. SHA-1 is an excellent example of how these incentives play out - to the harm of users and the overall ecosystem.

I still struggle to find the issue with timely and prompt revocation, but I do hope you can see the overall negative effect if relying parties no longer have assurances that the information from CAs in accurate and trustworthy, which is what this level of flexibility and discretion would ultimately result in.

>> For non-public issues, I'd rather work with the customers earlier 
>> than wait to be brought in until 24 hours before the revocation occurs.
>
> Could you explain or provide an example of this? As the CA - a service 
> provider role - naturally being brought in the end of a 
> customer-impacting issue is the right way to handle it.
> [JR] There's a consulting role involved as well as a certificate 
> issuance role. Many of our customers approach us with questions on the 
> use of PKI and best practices.  Heartbleed is an example. We'd like to 
> remain a consultant in the framework.  Another consideration is the CA 
> is easy for a researcher to contact and report issues to. We have 
> dedicated emails and personnel who handle these issues.  For example, 
> Hanno emailed me directly about compromised private keys. I have no 
> problem with this. To my knowledge, he did not reach out to the 
> subscribers themselves. I also have no problem with this.  As the CA, 
> we generally are the ones helping our customers with the certificate needs, including helping them through the revocation process.

Well, as the CA, isn't your duty first and foremost to ensure the information attested in honest and accurate? In the case of DV, this is primarily about ensuring the binding between the key and domain (hence the deprecation of "any equivalent method"), and in the case of OV/EV, it's an obligation to ensure that all the secondary information attested is both correct _and_ was obtained via the allowed means (that is, it's active misissuance if the information is correct, but was obtained via improper means).

I can totally understand the desire, as a business, to be customer oriented, but first and foremost the duty of the CA is to be a trustworthy binder of keys and identities, and that means serving the larger Relying Party population before serving any customers. I hope that's not controversial to state the priorities in that order.

>> Could we balance the issue to say within 24 hours of public 
>> disclosure or within two weeks of receiving a certificate problem 
>> report where the CA confirms that one of the reasons under Section 4?
>
> When we past discussed this, my understanding of the conclusion was 
> that the CA would be afforded up to two weeks to investigate the 
> problem report (with the requirement additional details about why the 
> delay occurred being made public), but that upon determining it met 
> one of the revocation reasons, was obligated to make the timely revocation under 4.9.1.1/4.9.1.2.
> [JR] What if the certificate was deployed to 1000+ servers? If the 
> private key is not disclosed, but there is a need to revoke, giving 
> them time to manage the revocation minimizes the impact on relying 
> parties. I generally don't need two weeks to investigate the problem. 
> What I need is two weeks to migrate the customer to a better practice, which is then followed by revocation.
>
> If I understand your proposal, it would be that if a CA determines the 
> Subscriber (or Subordinate CA) meets the requirements of 4.9.1.1, they 
> could still delay for up to two weeks before revoking, is that correct?
>
> [JR] Sort of - I propose 4.9.1.1(1) and 4.9.1.1(2) (and corresponding 
> sections under 4.9.2.2) require revocation within 24 hours. In both 
> cases, the customer requested revocation.  That really should be 
> immediate. What I'm proposing is the CA could delay two weeks for 
> things like a subscriber agreement breach or if certificate information changes (such as a change in address).

Well, we either believe this information is valuable and worthwhile to Relying Parties - in which case, at all points, we should strive to maintain its accuracy - or we believe it is not useful to Relying Parties, and we should seek to forbid CAs from introducing information, such as addresses, that may mislead them.

I'm not sure that we can both have and eat our certificate cake - if we want the bindings to be used, they need to be timely and accurate.
If we don't believe the bindings need to be timely and accurate, we should not allow them to be added.

> How does that benefit Relying Parties?
> [JR] It prevents critical systems from being shut off before the 
> server operator can migrate to a new certificate.  It's an attempt to 
> balance risk of misuses (a non-public event is less risky) with the 
> desire to keep all infrastructure secured. The result of revocation is 
> usually a move to no encryption rather than some new encryption - 
> especially if revocation was for something like a change in address.

I don't believe we should weaken revocation to address this issue.
That is, even as proposed, a server operator still must be prepared to rotate certificates within 24 hours under certain circumstances. So really, we should be focusing on making it easier for server operators to perform such rotations - and year round, so that we don't fall into the trap of holiday freezes, and can instead let robust automated systems do the work.

I have trouble with the argument that non-public events are less risky
- that seems to be arguing security through obscurity - but perhaps I've misunderstood your point.

In some ways, you seem to be arguing that the status quo is good - that hand-installed certificates or manual difficulties with revocation are desirable properties to support - but shouldn't we, as an industry, instead hold to the principles (that the binding in a certificate is worthwhile and meaningful), and help the industry understand and adapt to that, rather than weaken the security afforded in order to accommodate those who otherwise lack incentive to invest in sufficient security?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4964 bytes
Desc: not available
URL: <http://cabforum.org/pipermail/public/attachments/20170716/ffa74f54/attachment.p7s>