Thursday, May 9, 2013

ISC DHCPD programming fun: (attempting to) install routes following DHCP-PD operation

For testing, I need to run both the DHCPv6 server and the router on the single box, with the CPEs running behind it doing client DHCP-PD requests.

As such box I am using a Linksys E3200 running TomatoUSB + with optware aiccu in order to flexibly get the globally routable IPv6 chunk, and optware dhcpd to do all the DHCPv6 stuff.

Configuring DHCP-PD on the ISC DHCP was a breeze, but the tricky part is routing the traffic - DHCP-PD implies the prefix is behind a certain CPE, which implies a route... Some searching pointed to - but it seems to be a no-op with IPv6... duh. However, Wim's (@42wim) blog entry at hinted at another approach... 

The result is the below ugly hack, which might be enough for my needs (I won't change the CPEs too often and they are all OpenWRT so the initial 5 minutes delay should be fine; and I do not look at the security aspect -  looks like this is just blindly installing routes based on the info supplied by the client in the REBIND, which in any real-world deployment would be a no-no... 

Also since I derive the link-local address from the mac address that I derive in a way similar to Wim's approach - this is also very very fragile, but again - for my toy setup it is enough. 

The funniest part was the U/L bit flipping while converting the MAC address into the Link-Local IPv6 address. Luckily the ISC DHCP implements just enough logic for it to work. (Though I would have liked a decent programming language there... Maybe ISC folks could just hook Lua inside ? I suspect this might actually decrease the amount of code and with something like e.g. Lua 5.1 the bugs are very rare if at all.)

If someone has a better idea on how to implement this, feel free to write in the comment. My scenario constraints are:

1) the ISC DHCP server is whatever is supplied with tomatousb firmware, I believe 4.1 - I can not recompile or upgrade it to anything newer.

2) I do not have any cross-compiler toolchain for tomatousb

So I am pretty restricted in what can be done.

Anyway,  here goes my dhcp.conf for your enjoyment:

# BIIIG caveat:
# * does not work on initial allocation.
#   looks like we can only study the options sent by the client, 
#   not the options about to be sent by the server, so...
# Some assumptions:
# * DUID has mac address in the end (openwrt does this)
# * only one prefix in PD, and one address in NA. This is a bit buggy
default-lease-time 600;
max-lease-time 7200; 
log-facility local7; 

 # thanks @42Wim! :-)

 option dhcp6.macaddr code 193 = string;
 option dhcp6.leased-address code 194 = string;
 option dhcp6.pkt code 9998 = string;
 option dhcp6.leased-prefix code 9999 = string;
 option dhcp6.leased-prefix-len code 9997 = string;
 option dhcp6.ll-addr code 9996 = string;
 option dhcp6.leased-prefix-cidr code 9995 = string;
 option dhcp6.uli code 6011 = integer 32;
 option dhcp6.ulo code 6010 = integer 32;
 option dhcp6.macaddr = binary-to-ascii(16, 8, ":", 
                                        suffix(option dhcp6.client-id, 6));
 # extract the byte with U/L bit
 option dhcp6.uli = substring(suffix(option dhcp6.client-id, 6), 0, 1);

 # invert the U/L bit by checking the symbol in the binary string 
 # representation and adjusting the result accordingly
 if substring(suffix(binary-to-ascii(2, 8, "",
                                  config-option dhcp6.uli), 2), 0, 1) = "1" {
   # Seems there's no minus so we gotta do subtraction by addition overflow
   option dhcp6.ulo = encode-int(
                         extract-int(config-option dhcp6.uli, 8) + 254, 8);
 } else {
   option dhcp6.ulo = encode-int(
                         extract-int(config-option dhcp6.uli, 8) + 2, 8);
 option dhcp6.ll-addr = concat("fe80::", binary-to-ascii(16,16, ":", 
                                 concat(config-option dhcp6.ulo, 
                                        substring(suffix(option dhcp6.client-id, 6), 1, 2), 
                                        encode-int(255, 8),
                                        encode-int(254, 8),
                                        substring(suffix(option dhcp6.client-id, 6), 3, 3))) );
 option dhcp6.leased-prefix = binary-to-ascii(16,16, ":",
                                suffix(substring(option dhcp6.ia-pd, 12, 100), 16));
 option dhcp6.leased-prefix-len = binary-to-ascii(10,8, ".",
                                    substring(suffix(substring(option dhcp6.ia-pd, 12, 100), 17), 0, 1));
 option dhcp6.leased-prefix-cidr = concat (config-option dhcp6.leased-prefix, "/", 
                                           config-option dhcp6.leased-prefix-len);
 if substring(config-option dhcp6.leased-prefix, 0, 1) = "2" {
   log (info, concat ("Prefix ",config-option dhcp6.leased-prefix-cidr, 
                      " leased to ", config-option dhcp6.macaddr, 
                      " via ", config-option dhcp6.ll-addr));
   execute("/usr/sbin/ip", "-6", "route", "del", config-option dhcp6.leased-prefix-cidr);
   execute("/usr/sbin/ip", "-6", "route", "add", config-option dhcp6.leased-prefix-cidr, 
                                                 "via", config-option dhcp6.ll-addr, "dev", "br0");
 } else {
   log (info, "No prefix leased in this packet or not REBIND");
# pretty standard stuff.

subnet6 2001:6f8:147e::/64 {
        # Range for clients
        range6 2001:6f8:147e::1000 2001:6f8:147e::ffff;
        # Additional options
        option 2001:6f8:147e::1;
        option dhcp6.domain-search "domain.example";

        # Prefix range for delegation to sub-routers
        prefix6 2001:6f8:147e:1100:: 2001:6f8:147e:1f00::  /56;

Sunday, March 10, 2013

ZFS experiments sketch - some interesting commands

This is just a "notebook/cheatsheet page" for myself. Just to store some commands I used for my ZFS experiments.

zpool create -o ashift=12 tank /dev/disk/by-id/ata-ST31000340AS_5QJ0TXXH /dev/disk/by-id/ata-ST31000340AS_5QJ0WHFC

zfs create tank/store

zfs set compression=on tank

zfs set sync=disabled tank
zfs set atime=off tank

^^ Taken from a thread in

root@desktop:/tank/store# zpool list
tank  1.81T  17.3G  1.80T     0%  1.00x  ONLINE  -

root@desktop:/tank/store# zfs get compressratio tank/store
tank/store  compressratio  1.12x  -

root@desktop:/tank/store# zpool iostat -v
                                capacity     operations    bandwidth
pool                         alloc   free   read  write   read  write
---------------------------  -----  -----  -----  -----  -----  -----
tank                         75.7G  1.74T    289    536  33.0M  61.0M
  ata-ST31000340AS_5QJ0TXXH  37.8G   890G    144    268  16.5M  30.5M
  ata-ST31000340AS_5QJ0WHFC  37.8G   890G    145    268  16.5M  30.5M
---------------------------  -----  -----  -----  -----  -----  -----

root@desktop:/tank/store# zfs list
tank         131G  1.66T   144K  /tank
tank/store   131G  1.66T   131G  /tank/store


Cupcakes, quite literally.

The boyfriend has kindly, and with an impressive degree of trust, allowed me to gatecrash his blog. What follows aims to be musings and notes on the more mundane aspects of life in HappyHouse. 

Btw, HappyHouse is our home share in central Brussels populated by one BF from Russia, one DJ from Flanders and myself from Ireland.

Saturday, March 9, 2013

A comment on an article "IPv6 focus month at ISC: Filtering ICMPv6 at the border"

I've started to write the comment on an ISC article about ICMPv6 filtering (, but the text of the comment became so pathologically long that I decided to put it here as a blog entry.

I wanted to highlight a few parts of the text that I consider as not putting enough emphasis on the very important unintended consequences.

"Blocking them will typically not disrupt connections, but harm performance. In IPv6, routers no longer fragment packets. Instead the source does, after receiving a Packet Too Big ICMPv6 message. If they are blocked at the firewall, connection can not be established."

The connection (if we are talking plain TCP) both in IPv4 and IPv6 will be completed before any Packet Too Big ICMP can be sent - the packets involved in the three way handshake are small the vast majority of the time.

So, the connection will be established. However, blocking the ICMPv6 Packet Too Big will result in the deadlock on the connection if the data sent exceeds the MTU. Both on IPv4 and IPv6.

This is what is collaterally called "PMTUD Blackhole".

In IPv4 you can get away with that by explicitly disabling the Path MTU discovery on the host - by default nearly all OS I know of do have it enabled. This is when indeed the performance will be impacted due to fragmentation, and this is what you can not do in IPv6.

The hosts implementing PLPMTUD (RFC4821) may eventually recover from this condition.

But for highly interactive applications like web browsing this recovery will take quite a long time and will have a negative impact on the user experience.

Also, the mechanisms in RFC6555 are explicitly unable to handle this condition - so both the hosts that implement the "classic" behavior (preferring IPv6) and the hosts that implement "Happy Eyeballs" are affected.

Thus, with server NIC MTU above 1280, blocking the ICMPv6 means being actively hostile to all the users with lower MTU IPv6 connections - even if they also have a "perfectly working" IPv4 path.

Fully understanding this implication is critical before reading further, where I look at the potential consequences of trying to remediate it.

"Now there is one trick you can play to eliminate fragmentation of outbound traffic in IPv6: Set your MTU to 1280 bytes. This is the minimum allowed MTU for IPv6, and a packet 1280 bytes long will not require any fragmentation. But this comes at the cost of having to use more and smaller packets, again a performance penalty."

Setting the MTU for 1280 is at the first glance indeed a great measure, after all, the standard seems to give us carte blanche to do it:

   IPv6 requires that every link in the internet have an MTU of 1280
   octets or greater.  On any link that cannot convey a 1280-octet
   packet in one piece, link-specific fragmentation and reassembly must
   be provided at a layer below IPv6.

So - all the links will have the MTU of at least 1280, thus we will not need to perform the Path MTU discovery, and we neatly avoid all the caveats of blocking the ICMPv6 ?

Not so fast. Let's take a look at the another passage of interest in RFC2460:

   It is strongly recommended that IPv6 nodes implement Path MTU
   Discovery [RFC-1981], in order to discover and take advantage of path
   MTUs greater than 1280 octets.  However, a minimal IPv6
   implementation (e.g., in a boot ROM) may simply restrict itself to
   sending packets no larger than 1280 octets, and omit implementation
   of Path MTU Discovery.

   In order to send a packet larger than a path's MTU, a node may use
   the IPv6 Fragment header to fragment the packet at the source and
   have it reassembled at the destination(s).  However, the use of such
   fragmentation is discouraged in any application that is able to
   adjust its packets to fit the measured path MTU (i.e., down to 1280

   A node must be able to accept a fragmented packet that, after
   reassembly, is as large as 1500 octets.  A node is permitted to
   accept fragmented packets that reassemble to more than 1500 octets.
   An upper-layer protocol or application that depends on IPv6
   fragmentation to send packets larger than the MTU of a path should
   not send packets larger than 1500 octets unless it has assurance that
   the destination is capable of reassembling packets of that larger

   In response to an IPv6 packet that is sent to an IPv4 destination
   (i.e., a packet that undergoes translation from IPv6 to IPv4), the
   originating IPv6 node may receive an ICMP Packet Too Big message
   reporting a Next-Hop MTU less than 1280.  In that case, the IPv6 node
   is not required to reduce the size of subsequent packets to less than
   1280, but must include a Fragment header in those packets so that the
   IPv6-to-IPv4 translating router can obtain a suitable Identification
   value to use in resulting IPv4 fragments.  Note that this means the
   payload may have to be reduced to 1232 octets (1280 minus 40 for the
   IPv6 header and 8 for the Fragment header), and smaller still if
   additional extension headers are used.

Notice the last paragraph. Instead of sending the smaller packets, your host goes into a totally different mode and starts sending so-called "atomic fragments" - it's a normal full-payload packet with the addition of the fragment fields that are otherwise absent from an IPv6 header. This can allow an intermediate to perform the fragmentation, just like in IPv4!

And here come some inconvenient questions:

Are you allowing the IPv6 fragments to pass through your network ?
Does your party's firewall allow IPv6 fragments ?

If not - you've just blackholed the communication for any path MTU that is smaller than 1280.

Hey, but did not we just quote the very standard which said that every link is supposed to be able to transmit at least 1280 bytes of data ? Why would we ever get into this strange condition ?

Yes we did. This all works until you start to add the payload to the packet somewhere in the middle of the path.

Consider what happens when you have a "hardware" HTTP load balancer that is also doing TLS offload.

Thus, the TLS session is terminated on the load balancer, and the backend servers talk HTTP only.

TLS is doing two things: "jumbling" the payload and adding the necessary information to "unjumble" it on the other side. This is done above the TCP session level.

However, you can do this with most efficiency by minimising the amount of work done by the TLS offloader - and the minimal work will be done if your cleartext payload is just small enough to fit both the "jumbled" payload and the "unjumbling information" into the path MTU towards the client. You still need to do it above the TCP session, so you need to do some accounting, but the packet modification is simplest with this approach.

However, just shortly before, we did set the MTU on the TLS side to be 1280. So the server needs to send the cleartext payload in a packet that is less than 1280 bytes, and we need to signal this to the server!

So, we squarely hit the conditions to generate an IPv6 "atomic fragment"!

Thus, I pose the questions again:

Are you allowing the IPv6 fragments to pass through your network ?
Does your party's firewall allow IPv6 fragments ?

If not - maybe you've have just blackholed the communication for your SSL clients.

You say - "No problem, why do not the load balancers hack a bit more than what they are already doing and just turn one TCP segment into two, it will all work ?"

They may do so, but such an operation will require more state tracking than if the segments on both sides correspond 1:1. This quite possibly will affect the performance or capacity of the box.

The TLS offload is not the only potential failure scenario. Another dangerous situation with MTU "impedance mismatch" would involve some kinds of tunnelling. I am deliberately vague, because the tunnelling can come in all shapes and forms.

MPLS, LISP, IPSec, GRE, L2TP, OTV, VXLAN, in various deployment modes - to name but a few. This tunnelling might happen either on your or on others' network, with or without your knowledge. Also some of the protocols might exhibit failure only in some deployments. The point is - this can happen.

So, if you are thinking of blocking all of ICMPv6, think again.

Have you considered all the potential consequences of your actions ?

Please weigh the risk that you are mitigating against the guaranteed harm to some of the end users - the security is all about the tradeoffs - else you'd just install an A1 firewall and be done :-)

P.S. An explicit disclaimer: This blogpost is my personal blogpost on my personal blog. It was made during my personal time on a Saturday evening. Therefore all of this article should be treated as my personal opinion, and not of my employer.

Thursday, January 17, 2013

Random Ramblings about DDoS and content-oriented networking...

This is not intended to be original in any form, as usual, mostly to capture my own thought processes in an exhibio- (uh... the spell checker does not take any of the variants of this word but you get what I am talking about) kind of way. With that, let's proceed.

So, first wild assumption: there are no useful (human undetectable) cryptographic hash collisions. So we can assume that one can run over every piece of content in the world and enumerate them by sha1(what_you_get_from_network).

So, with that assumption in mind, let's just throw away the entire host-based URI thing and use the hash to address the content. Seriously - I either click on my URLs or search for them - so, it really does not matter. Let's call this new scheme "hash", with the new hash://sha1:4e1243bd22c66e76c2ba9eddc1f91394e57f9f83/

This is shorter than a lot of existing URLs. Moreover, with our first assumption, you can actually tell that as soon as you get anything other than "test" in response - it's the wrong content.

So, if you were an SP transiting this content - you could trivially cache the whole thing, and give a near-instantaneous replies for the content you already saw.

Of course, this whole new hash:// protocol is a misnomer - even with the current catastrophic worldwide finances I probably have more chances to retire than to see it ever implemented. Why ?

Because http://4e1243bd22c66e76c2ba9eddc1f91394e57f9f83.sha1.subdomain.tld/ could serve just as well...

Provided that it were agreed upon, such a "subdomain.tld" would mean that the originator says:

1) "I authorise anyone to make a cached copy of the content, based on the subdomain name to derive the algorithm, and with the hash value as specified in the sub-subdomain, and to serve that as an immediate replacement for the content that I am offering at that URL."

2) "I authorize the MITM on the subdomain.tld domain by your name servers, and redirecting the lookups to whatever the content servers are at your install - given that the client can verify the content anyway".

Of course you would not use this kind of URL right away or when the content is changing - so you'd either use an iframe-based container, or a redirect.

But, this seems to allow a few gradual steps towards the content-driven static data:

1) dedicated domain, same as now, RTT ~ 50..200ms

2) caching at the ISP level, RTT ~30ms.

3) caching at the LAN level, RTT ~1ms.

4) caching at the user agent level, RTT ~0.1ms.

So overall it seems like a doable and reasonable approach with fallback ?


p.s. This was for static content. You can do similar tricks for the code, but that's a topic for another day...

Saturday, January 5, 2013

ifConfig 1.0 for iOS - a simple free utility to show the interface addresses

I've finished building and submitted to AppStore another nice little utility, which I have called "ifConfig" - it is a very simple way to see the IPv4 and IPv6 addresses assigned to the interfaces, and copy the addresses to the clipboard.

It's nothing fancy but mostly a convenient way to see the addresses at once.

For now, only the addresses are shown, with no subnet mask/prefix length or any other information - the goal is a very simple and very clean single-screen interface functionality.

Thursday, January 3, 2013

WiFi RoamTracker 1.0

The RoamTracker is a relatively simple iOS app that I needed for myself to do some debugging, and thought that it might be useful to the others as well..

Basically it is a polling loop which every second takes a note of the current SSID and BSSID, and maintains the history of the changes, and gives a short click sound with each change.

It also has a button to mail the contents of the history as a CSV text via your configured mail account.

That is pretty much all that there is to it, for now.

If you are using it and have the feedback/bug reports: feel free to do it here via comments.