Sunday, March 10, 2013

ZFS experiments sketch - some interesting commands

This is just a "notebook/cheatsheet page" for myself. Just to store some commands I used for my ZFS experiments.

zpool create -o ashift=12 tank /dev/disk/by-id/ata-ST31000340AS_5QJ0TXXH /dev/disk/by-id/ata-ST31000340AS_5QJ0WHFC

zfs create tank/store

zfs set compression=on tank

zfs set sync=disabled tank
zfs set atime=off tank

^^ Taken from a thread in

root@desktop:/tank/store# zpool list
tank  1.81T  17.3G  1.80T     0%  1.00x  ONLINE  -

root@desktop:/tank/store# zfs get compressratio tank/store
tank/store  compressratio  1.12x  -

root@desktop:/tank/store# zpool iostat -v
                                capacity     operations    bandwidth
pool                         alloc   free   read  write   read  write
---------------------------  -----  -----  -----  -----  -----  -----
tank                         75.7G  1.74T    289    536  33.0M  61.0M
  ata-ST31000340AS_5QJ0TXXH  37.8G   890G    144    268  16.5M  30.5M
  ata-ST31000340AS_5QJ0WHFC  37.8G   890G    145    268  16.5M  30.5M
---------------------------  -----  -----  -----  -----  -----  -----

root@desktop:/tank/store# zfs list
tank         131G  1.66T   144K  /tank
tank/store   131G  1.66T   131G  /tank/store


Cupcakes, quite literally.

The boyfriend has kindly, and with an impressive degree of trust, allowed me to gatecrash his blog. What follows aims to be musings and notes on the more mundane aspects of life in HappyHouse. 

Btw, HappyHouse is our home share in central Brussels populated by one BF from Russia, one DJ from Flanders and myself from Ireland.

Saturday, March 9, 2013

A comment on an article "IPv6 focus month at ISC: Filtering ICMPv6 at the border"

I've started to write the comment on an ISC article about ICMPv6 filtering (, but the text of the comment became so pathologically long that I decided to put it here as a blog entry.

I wanted to highlight a few parts of the text that I consider as not putting enough emphasis on the very important unintended consequences.

"Blocking them will typically not disrupt connections, but harm performance. In IPv6, routers no longer fragment packets. Instead the source does, after receiving a Packet Too Big ICMPv6 message. If they are blocked at the firewall, connection can not be established."

The connection (if we are talking plain TCP) both in IPv4 and IPv6 will be completed before any Packet Too Big ICMP can be sent - the packets involved in the three way handshake are small the vast majority of the time.

So, the connection will be established. However, blocking the ICMPv6 Packet Too Big will result in the deadlock on the connection if the data sent exceeds the MTU. Both on IPv4 and IPv6.

This is what is collaterally called "PMTUD Blackhole".

In IPv4 you can get away with that by explicitly disabling the Path MTU discovery on the host - by default nearly all OS I know of do have it enabled. This is when indeed the performance will be impacted due to fragmentation, and this is what you can not do in IPv6.

The hosts implementing PLPMTUD (RFC4821) may eventually recover from this condition.

But for highly interactive applications like web browsing this recovery will take quite a long time and will have a negative impact on the user experience.

Also, the mechanisms in RFC6555 are explicitly unable to handle this condition - so both the hosts that implement the "classic" behavior (preferring IPv6) and the hosts that implement "Happy Eyeballs" are affected.

Thus, with server NIC MTU above 1280, blocking the ICMPv6 means being actively hostile to all the users with lower MTU IPv6 connections - even if they also have a "perfectly working" IPv4 path.

Fully understanding this implication is critical before reading further, where I look at the potential consequences of trying to remediate it.

"Now there is one trick you can play to eliminate fragmentation of outbound traffic in IPv6: Set your MTU to 1280 bytes. This is the minimum allowed MTU for IPv6, and a packet 1280 bytes long will not require any fragmentation. But this comes at the cost of having to use more and smaller packets, again a performance penalty."

Setting the MTU for 1280 is at the first glance indeed a great measure, after all, the standard seems to give us carte blanche to do it:

   IPv6 requires that every link in the internet have an MTU of 1280
   octets or greater.  On any link that cannot convey a 1280-octet
   packet in one piece, link-specific fragmentation and reassembly must
   be provided at a layer below IPv6.

So - all the links will have the MTU of at least 1280, thus we will not need to perform the Path MTU discovery, and we neatly avoid all the caveats of blocking the ICMPv6 ?

Not so fast. Let's take a look at the another passage of interest in RFC2460:

   It is strongly recommended that IPv6 nodes implement Path MTU
   Discovery [RFC-1981], in order to discover and take advantage of path
   MTUs greater than 1280 octets.  However, a minimal IPv6
   implementation (e.g., in a boot ROM) may simply restrict itself to
   sending packets no larger than 1280 octets, and omit implementation
   of Path MTU Discovery.

   In order to send a packet larger than a path's MTU, a node may use
   the IPv6 Fragment header to fragment the packet at the source and
   have it reassembled at the destination(s).  However, the use of such
   fragmentation is discouraged in any application that is able to
   adjust its packets to fit the measured path MTU (i.e., down to 1280

   A node must be able to accept a fragmented packet that, after
   reassembly, is as large as 1500 octets.  A node is permitted to
   accept fragmented packets that reassemble to more than 1500 octets.
   An upper-layer protocol or application that depends on IPv6
   fragmentation to send packets larger than the MTU of a path should
   not send packets larger than 1500 octets unless it has assurance that
   the destination is capable of reassembling packets of that larger

   In response to an IPv6 packet that is sent to an IPv4 destination
   (i.e., a packet that undergoes translation from IPv6 to IPv4), the
   originating IPv6 node may receive an ICMP Packet Too Big message
   reporting a Next-Hop MTU less than 1280.  In that case, the IPv6 node
   is not required to reduce the size of subsequent packets to less than
   1280, but must include a Fragment header in those packets so that the
   IPv6-to-IPv4 translating router can obtain a suitable Identification
   value to use in resulting IPv4 fragments.  Note that this means the
   payload may have to be reduced to 1232 octets (1280 minus 40 for the
   IPv6 header and 8 for the Fragment header), and smaller still if
   additional extension headers are used.

Notice the last paragraph. Instead of sending the smaller packets, your host goes into a totally different mode and starts sending so-called "atomic fragments" - it's a normal full-payload packet with the addition of the fragment fields that are otherwise absent from an IPv6 header. This can allow an intermediate to perform the fragmentation, just like in IPv4!

And here come some inconvenient questions:

Are you allowing the IPv6 fragments to pass through your network ?
Does your party's firewall allow IPv6 fragments ?

If not - you've just blackholed the communication for any path MTU that is smaller than 1280.

Hey, but did not we just quote the very standard which said that every link is supposed to be able to transmit at least 1280 bytes of data ? Why would we ever get into this strange condition ?

Yes we did. This all works until you start to add the payload to the packet somewhere in the middle of the path.

Consider what happens when you have a "hardware" HTTP load balancer that is also doing TLS offload.

Thus, the TLS session is terminated on the load balancer, and the backend servers talk HTTP only.

TLS is doing two things: "jumbling" the payload and adding the necessary information to "unjumble" it on the other side. This is done above the TCP session level.

However, you can do this with most efficiency by minimising the amount of work done by the TLS offloader - and the minimal work will be done if your cleartext payload is just small enough to fit both the "jumbled" payload and the "unjumbling information" into the path MTU towards the client. You still need to do it above the TCP session, so you need to do some accounting, but the packet modification is simplest with this approach.

However, just shortly before, we did set the MTU on the TLS side to be 1280. So the server needs to send the cleartext payload in a packet that is less than 1280 bytes, and we need to signal this to the server!

So, we squarely hit the conditions to generate an IPv6 "atomic fragment"!

Thus, I pose the questions again:

Are you allowing the IPv6 fragments to pass through your network ?
Does your party's firewall allow IPv6 fragments ?

If not - maybe you've have just blackholed the communication for your SSL clients.

You say - "No problem, why do not the load balancers hack a bit more than what they are already doing and just turn one TCP segment into two, it will all work ?"

They may do so, but such an operation will require more state tracking than if the segments on both sides correspond 1:1. This quite possibly will affect the performance or capacity of the box.

The TLS offload is not the only potential failure scenario. Another dangerous situation with MTU "impedance mismatch" would involve some kinds of tunnelling. I am deliberately vague, because the tunnelling can come in all shapes and forms.

MPLS, LISP, IPSec, GRE, L2TP, OTV, VXLAN, in various deployment modes - to name but a few. This tunnelling might happen either on your or on others' network, with or without your knowledge. Also some of the protocols might exhibit failure only in some deployments. The point is - this can happen.

So, if you are thinking of blocking all of ICMPv6, think again.

Have you considered all the potential consequences of your actions ?

Please weigh the risk that you are mitigating against the guaranteed harm to some of the end users - the security is all about the tradeoffs - else you'd just install an A1 firewall and be done :-)

P.S. An explicit disclaimer: This blogpost is my personal blogpost on my personal blog. It was made during my personal time on a Saturday evening. Therefore all of this article should be treated as my personal opinion, and not of my employer.