[Servercert-wg] Ballot SC28: Logging and Log Retention

Tue Jun 2 02:42:00 MST 2020

Ryan,

Thanks for the feedback. Comments inline.

On 27/05/2020 19:51, Ryan Sleevi wrote:
>
>
> Just a note: In the GitHub redline, this is repeated twice :) I
> realize it's not official, but it stood out ;)
Oops! Fixed.
>
>
>         The CA SHALL retain, for at least two years:
>
>
> My first glance when reading this is that it reads a little weird, and
> I can easily see it leading to misinterpretation. "For at least two
> years" reads as a period since the event happened, but that's not
> actually the case - it's for at least two years /following/ a
> different event (namely, destruction/revocation/expiration).
This is one where we genuinely had a problem expressing unambiguously.
Your interpretation is correct (two years following the event is the
retention time)
> I don't have a great concrete solution here, but if it helps settle
> any debates, it does seem confusing :P I'm loath to introduce a term
> like "log retention period", but that might help? e.g. "The log
> retention period for CA certificate and key lifecycle management event
> records shall be from the moment of the event until at least two years
> after the ..."?
Yeah - not sure that utterly removes confusion. I'm willing to work on
it some more though.
>
> 1) This is a little confusing on terminology. 5.4.1 stipulates the CA
> SHALL record, but then later in the same section, states "Log
> entries". Are "log entries" "log records"? Should we align the two terms.
>   Suggestion: Pick a term (i don't care which) and let's use it
> consistently, either as "log entries" or "log records", or highlight
> why they're distinct/different
Provisionally [meaning, I need to run this by the endorsers], I've fixed
my redline to use "log records" throughout.
>
> 2) There is no Section titled 5.4.1.1; there's a Section 5.4.1 with a
> bulleted list that has a 1. We don't really have an established
> notation here, but I don't think we've treated all numbered lists as
> inherently subsections (notoriously, 4.9.1.1's requirements are hard
> to cite).
>   Suggestion: Tease 5.4.1 into sub-sections, and move the requirements
> that follow the list into the top-level 5.4.1 as applying to all of
> those subsections?

Provisionally, I've changed 5.4.1.x to be 5.4.1 (x), hopefully to
indicate that x is not a formal section header, but an element in a list.

>     after either: the destruction of the CA
>
>
> CA _Private_ Key
Provisionally, fixed.
> There's an ambiguity here, which is when there are multiple
> certificates associated with a given key (e.g. a Root Certificate and
> a Subordinate CA that is a Cross-Certificate). The wording of this
> requirement implies a singular requirement ("the CA key"), which would
> seem to permit a CA arguing that they no longer have to retain the
> events after /one/ certificate expires, rather than after /all/
> certificates expire. This isn't the only section to face that
> challenge (e.g. https://github.com/cabforum/documents/issues/187 ),
> but it seems appropriate to try and tackle this.

So, my first attempt at a fix - and I grant you it's clunkier than an
Austin Allegro (substitute culture specific example of a bad car here) -
is to use this terminology:

    [...] or the revocation or expiration of the final CA Certificate in
    that set of Certificates sharing a common Public Key which
    corresponds to the CA Private Key

Does this allow wiggle room which we would rather not allow? Is there
something more succinct that we can use (highly probable)?

>  
>
>             3. any security event records (as set forth in Section
>     5.4.1.3)
>     after the event occured.
>
>
> I am most uneasy about this. I think it's understandable with respect
> to firewall/router logs the challenges faced by CAs, but I'm deeply
> concerned about things like CA entry/exit. The problem is the system
> event logs are relevant to the overall trust in the CA, and you really
> want to make sure you have the lifecycle history... for the CA
> certificate.

Would having the entry/exit records from many years ago really help
demonstrate trust in the CA, though? Given that the CA certificate can
last 15-20 years, I'd be interested to hear what forensic value a 10
year old CA facility entry/exit record would have. I'm *not* saying that
there is *no* value in retaining records indefinitely - the question is
what forensic value they have versus the cost of their maintenance? Do
you think that some of the system records need to be extended to cover
the lifetime of all CA Certificates? In this case, move the facility
entry/exit records into 5.4.1 (1)?

[Note: this is not meant to be pejorative or ad absurdum reasoning -
it's a genuine query to get other viewpoints on what information, in
quality and quantity, is useful to those wishing to evaluate the
trustworthiness of a CA]

> I may not be fully appreciating the scope of the challenge, and I know
> it'll be laughable coming from a Googler, but "terabytes of data" does
> not sound 'that' hard, especially given the vital role of CAs.

The actual quantity of the data is not particularly significant - it's
the indexing, searching and ability to produce meaningful audit reports
out of the morass which becomes problematic. An ELK stack or Splunk
stack is fine, but the processing requirements get pretty hairy

From an auditor's perspective, to get the evidence that a CA is sticking
to its policies, you need to be able to tease out meaningful logs

> It seems like the focus on this ballot is "If there's something to
> catch, we would have caught it sooner", but I'm moreso interested in
> the "If something goes wrong, we can really understand how, where, and
> why it went wrong". I can appreciate the argument that no one would be
> patient enough to wait two years before doing shenanigans, but some of
> these things (like Certificate Profiles) are totally things that have
> gone years before shenanigans/issues caught.

I think that the "two years to begin exploit" is unlikely indeed; the
references to the "Cost of a Data Breach" report gives the likely
lifetimes of an exploit.

It would help me a bit if you could expand a bit on Certificate Profile
issues. I mean, I can guess that what you (meaning a Root Program) is
after is the root cause analysis of a problem; does two years retention
of the change logs really frustrate the forensics versus a seven year
retention?

> So one obvious implication to this is that CAs no longer have to log
> what software is installed on their system. As long as it's not
> "security" software, the CA would argue that under 5.4.1 (3)(2),
> installing software on the CA is not itself the PKI system (e.g.
> EJBCA) or the security system (e.g. the firewall/audit logging), ergo
> doesn't need to be logged.
>
Ah - I'm not sure that this is the case though. Any audit begins with a
system description, and the entire software manifest of a host is
definitely part of that description. Following SC29, changes to the
software manifest must follow a change control process, which documents
the changes being proposed - I would have thought that if change logs
were not retained that would be bizarre in the extreme; after all, if
you didn't have historical logs of changes, how could you possibly
demonstrate that you even _have_ a change control process?

The review part of the change must surely entail the new manifest being
recorded (otherwise, how would you know if the change worked or not?)

In any case, the vague term "log system activity" in the existing NCSSRs
seems to allow enough argument to say that software changes doesn't
really constitute "activity" but rather "configuration". A perverse
argument, I concede, but I'm not sure that the BR alignment actually
allows software manifest changes to not be logged.

> I definitely appreciate the effort to bring more of the NCSSRs in
> harmony with the BRs, potentially permitting their eventual
> integration of the useful bits, but some of the places of broad
> applicability are useful (for Root Programs), even if burdensome (for
> CAs). 

Placing burdens on a CA in order to retain trust is entirely reasonable;
the question is whether the burdens are of practical utility. The
feeling from the NetSec group is that some of these burdens - while
computationally and organisationally _feasible_ albeit _onerous_, don't
actually serve much _use_ in demonstrating trustworthiness.

In any case, a new (again, provisional!) redline is here:

https://github.com/cabforum/documents/compare/16a5a9b...neildunbar:0e02b27

I'll run those changes past the endorsers, and if we're good, roll a v2
of the ballot into the discussion. I suspect we'll need another few go
arounds, but this was all valuable feedback.

Thanks,

Neil

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cabforum.org/pipermail/servercert-wg/attachments/20200602/1adcfbe3/attachment.html>