了解IP Fragmentation

之前稍微接触了一下libnids代码, 读了下项目的文档, 里面推荐了一篇论文《Eluding Network Intrusion Detection》, 比较全面的介绍了ids系统面临的一些问题. 里面提到了IP fragmentation的问题. 以前因为nmap的原因也听说过这个东东, 最近也一直补tcp/ip的相关知识, 刚好趁此机会好好了解一下.

协议

有关IP fragmentation在《TCP/IP协议详解: 卷1》里面有非常详细的介绍. 简单概括一下: 因为在物理层传输时, 会有一个最大传输单元的限制, 如果发送的ip报文过大, 这个包会被拆分成多个小的ip包, 然后再进行传输.

先仿照书用graphviz简单画一个图:

在发生拆解的时候, 主要是通过offsetflags字段来判断. 虽说id字段也说到会用, 但是发现实际代码基本都忽略了这个字段… 简单来说: 如果发生了拆包, 除最后一个包, flags都会打上有更多的包的标志位, 同时, offset字段会记录该包中的内容在整个数据包中的位置.

实验

根据《TCP/IP协议详解: 卷1》的描述, 我们可以使用sock程序来模拟(可以在这里找到源码). 不过我们同样可以使用nmap. 先准备好两台虚拟机192.168.10.101192.168.10.102. 在101这台机器上跑nmap.

1
sudo nmap -T4 192.168.10.102 -p12345 -f -vvv -sU --data-length 1444

这里采用了udp扫描方式, 指定了包的内容长度为1444, 指定-f参数来启用IP fragmentation, nmap默认会以按8个字段的长度来拆分内容. 然后在102机器上监听一下12345端口:

1
nc -u -l 12345 > /dev/null

ok, 最后我们在101机器上抓包看下情况:

1
2
3
4
5
6
7
8
9
10
11
12
13
sudo tcpdump -i eth1 host 192.168.10.102 -v

08:10:49.478396 IP (tos 0x0, ttl 39, id 54987, offset 0, flags [+], proto UDP (17), length 28)
    192.168.10.101.43206 > 192.168.10.102.12345: UDP, length 1444
08:10:49.479000 IP (tos 0x0, ttl 39, id 54987, offset 8, flags [+], proto UDP (17), length 28)
    192.168.10.101 > 192.168.10.102: udp
08:10:49.479489 IP (tos 0x0, ttl 39, id 54987, offset 16, flags [+], proto UDP (17), length 28)
    192.168.10.101 > 192.168.10.102: udp

...

08:10:49.511782 IP (tos 0x0, ttl 39, id 54987, offset 1448, flags [none], proto UDP (17), length 24)
    192.168.10.101 > 192.168.10.102: udp

注意一下offset以及flags的情况.

nmap中的解析

看完了现象, 我们就来看看对应的实现. 先简单看一下nmap的发送流程. 具体代码主要位于libnetutil/netutil.cc文件中的send_frag_ip_packet函数:

libnetutil/netutil.cc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/* Create and send all fragments of a pre-built IPv4 packet
 * Minimal MTU for IPv4 is 68 and maximal IPv4 header size is 60
 * which gives us a right to cut TCP header after 8th byte
 * (shouldn't we inflate the header to 60 bytes too?) */
int send_frag_ip_packet(int sd, const struct eth_nfo *eth,
  const struct sockaddr_in *dst,
  const u8 *packet, unsigned int packetlen, u32 mtu) {
  struct ip *ip = (struct ip *) packet;
  int headerlen = ip->ip_hl * 4; // better than sizeof(struct ip)
  u32 datalen = packetlen - headerlen;
  int fdatalen = 0, res = 0;
  int fragment=0;

  assert(headerlen <= (int) packetlen);
  assert(headerlen >= 20 && headerlen <= 60); // sanity check (RFC791)
  assert(mtu > 0 && mtu % 8 == 0); // otherwise, we couldn't set Fragment offset (ip->ip_off) correctly

  if (datalen <= mtu) {
    netutil_error("Warning: fragmentation (mtu=%lu) requested but the payload is too small already (%lu)", (unsigned long)mtu, (unsigned long)datalen);
    return send_ip_packet_eth_or_sd(sd, eth, dst, packet, packetlen);
  }

  u8 *fpacket = (u8 *) safe_malloc(headerlen + mtu);
  memcpy(fpacket, packet, headerlen + mtu);
  ip = (struct ip *) fpacket;

  // create fragments and send them
  for (fragment = 1; fragment * mtu < datalen + mtu; fragment++) {
    fdatalen = (fragment * mtu <= datalen ? mtu : datalen % mtu);
    ip->ip_len = htons(headerlen + fdatalen);
    ip->ip_off = htons((fragment - 1) * mtu / 8);
    if ((fragment - 1) * mtu + fdatalen < datalen)
      ip->ip_off |= htons(IP_MF);
#if HAVE_IP_IP_SUM
    ip->ip_sum = 0;
    ip->ip_sum = in_cksum((unsigned short *) ip, headerlen);
#endif
    if (fragment > 1) // copy data payload
      memcpy(fpacket + headerlen,
             packet + headerlen + (fragment - 1) * mtu, fdatalen);
    res = send_ip_packet_eth_or_sd(sd, eth, dst, fpacket, ntohs(ip->ip_len));
    if (res == -1)
      break;
  }
  free(fpacket);
  return res;
}

整个过程还是比较清晰的. 看到这里会对-f选项有比较深的理解了.

libnids中的重组

接下来我们看看这些切分的报文是如何重组的. 首先来看一下libnids中的方法. 根据注释, 方法是直接参考的Linux 2.0.36 kernel source. 方法如下:

  1. 先判断ip报文是否被分割
  2. 如果是的话, hash后缓存至ip报文队列
  3. 找到最后一个ip报文后合并

这部分处理的代码位于ip_fragment.c文件中, 代码还是比较长的, 我们分步来看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* Process an incoming IP datagram fragment. */
static char *
ip_defrag(struct ip *iph, struct sk_buff *skb)
{
  struct ipfrag *prev, *next, *tmp;
  struct ipfrag *tfp;
  struct ipq *qp;
  char *skb2;
  unsigned char *ptr;
  int flags, offset;
  int i, ihl, end;

  if (!hostfrag_find(iph) && skb)
    hostfrag_create(iph);

  /* Start by cleaning up the memory. */
  if (this_host)
    if (this_host->ip_frag_mem > IPFRAG_HIGH_THRESH)
      ip_evictor();

  /* Find the entry of this IP datagram in the "incomplete datagrams" queue. */
  if (this_host)
    qp = ip_find(iph);
  else
    qp = 0;

  /* Is this a non-fragmented datagram? */
  offset = ntohs(iph->ip_off);
  flags = offset & ~IP_OFFSET;
  offset &= IP_OFFSET;
  if (((flags & IP_MF) == 0) && (offset == 0)) {
    if (qp != NULL)
      ip_free(qp);      /* Fragmented frame replaced by full
                   unfragmented copy */
    return 0;
  }

这里主要是判断报文是否完整, 同时通过调用hostfrag_find函数找到对应的ip报文队列. 同发送代码相同, 主要是处理offset以及flags. 接着看:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
  /*
    If the queue already existed, keep restarting its timer as long as
    we still are receiving fragments.  Otherwise, create a fresh queue
    entry.
  */
  if (qp != NULL) {
    /* ANK. If the first fragment is received, we should remember the correct
       IP header (with options) */
    if (offset == 0) {
      qp->ihlen = ihl;
      memcpy(qp->iph, iph, ihl + 8);
    }
    del_timer(&qp->timer);
    qp->timer.expires = jiffies() + IP_FRAG_TIME;   /* about 30 seconds */
    qp->timer.data = (unsigned long) qp;    /* pointer to queue */
    qp->timer.function = ip_expire; /* expire function */
    add_timer(&qp->timer);
  }
  else {
    /* If we failed to create it, then discard the frame. */
    if ((qp = ip_create(iph)) == NULL) {
      kfree_skb(skb, FREE_READ);
      return NULL;
    }
  }
  /* Attempt to construct an oversize packet. */
  if (ntohs(iph->ip_len) + (int) offset > 65535) {
    // NETDEBUG(printk("Oversized packet received from %s\n", int_ntoa(iph->ip_src.s_addr)));
    nids_params.syslog(NIDS_WARN_IP, NIDS_WARN_IP_OVERSIZED, iph, 0);
    kfree_skb(skb, FREE_READ);
    return NULL;
  }

以上主要是做一些准备工作: 加入超时(一个双向队列), 申请内存, 判断包的大小.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
  /* Determine the position of this fragment. */
  end = offset + ntohs(iph->ip_len) - ihl;

  /* Point into the IP datagram 'data' part. */
  ptr = (unsigned char *)(skb->data + ihl);

  /* Is this the final fragment? */
  if ((flags & IP_MF) == 0)
    qp->len = end;

  /*
    Find out which fragments are in front and at the back of us in the
    chain of fragments so far.  We must know where to put this
    fragment, right?
  */
  prev = NULL;
  for (next = qp->fragments; next != NULL; next = next->next) {
    if (next->offset >= offset)
      break;            /* bingo! */
    prev = next;
  }
  /*
    We found where to put this one.  Check for overlap with preceding
    fragment, and, if needed, align things so that any overlaps are
    eliminated.
  */
  if (prev != NULL && offset < prev->end) {
    nids_params.syslog(NIDS_WARN_IP, NIDS_WARN_IP_OVERLAP, iph, 0);
    i = prev->end - offset;
    offset += i;        /* ptr into datagram */
    ptr += i;           /* ptr into fragment data */
  }
  /*
    Look for overlap with succeeding segments.
    If we can merge fragments, do it.
  */
  for (tmp = next; tmp != NULL; tmp = tfp) {
    tfp = tmp->next;
    if (tmp->offset >= end)
      break;            /* no overlaps at all */
    nids_params.syslog(NIDS_WARN_IP, NIDS_WARN_IP_OVERLAP, iph, 0);

    i = end - next->offset; /* overlap is 'i' bytes */
    tmp->len -= i;      /* so reduce size of    */
    tmp->offset += i;       /* next fragment        */
    tmp->ptr += i;
    /*
      If we get a frag size of <= 0, remove it and the packet that it
      goes with. We never throw the new frag away, so the frag being
      dumped has always been charged for.
    */
    if (tmp->len <= 0) {
      if (tmp->prev != NULL)
    tmp->prev->next = tmp->next;
      else
    qp->fragments = tmp->next;

      if (tmp->next != NULL)
    tmp->next->prev = tmp->prev;

      next = tfp;       /* We have killed the original next frame */

      frag_kfree_skb(tmp->skb, FREE_READ);
      frag_kfree_s(tmp, sizeof(struct ipfrag));
    }
  }

  /* Insert this fragment in the chain of fragments. */
  tfp = NULL;
  tfp = ip_frag_create(offset, end, skb, ptr);
    /* 
    No memory to save the fragment - so throw the lot. If we failed 
    the frag_create we haven't charged the queue. 
  */
  if (!tfp) {
    nids_params.no_mem("ip_defrag");
    kfree_skb(skb, FREE_READ);
    return NULL;
  }
  /* From now on our buffer is charged to the queues. */
  tfp->prev = prev;
  tfp->next = next;
  if (prev != NULL)
    prev->next = tfp;
  else
    qp->fragments = tfp;

  if (next != NULL)
    next->prev = tfp;

上面的代码之所以这么长主要是在处理报文问题: 可能的内容重叠情况. 还剩下最后几行:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
  /*
    OK, so we inserted this new fragment into the chain.  Check if we
    now have a full IP datagram which we can bump up to the IP
    layer...
  */
  if (ip_done(qp)) {
    skb2 = ip_glue(qp);     /* glue together the fragments */
    return (skb2);
  }
  return (NULL);
}

/* See if a fragment queue is complete. */
static int
ip_done(struct ipq * qp)
{
  struct ipfrag *fp;
  int offset;

  /* Only possible if we received the final fragment. */
  if (qp->len == 0)
    return (0);

  /* Check all fragment offsets to see if they connect. */
  fp = qp->fragments;
  offset = 0;
  while (fp != NULL) {
    if (fp->offset > offset)
      return (0);       /* fragment(s) missing */
    offset = fp->end;
    fp = fp->next;
  }
  /* All fragments are present. */
  return (1);
}

当受到最后一个报文时, 检查整个队列是否有报文缺失, 如果一切ok, 则合并~ 以上就是对应的ip报文重组的过程, 可见操作还是比较多的.

kernel中的重组

看完了libnids的实现, 突然好奇现在的内核是如何处理ip fragmentation的. 接下来就看了看kernel 3.16.6的处理逻辑. 代码主要在net/ipv4/ip_fragment.c文件的ip_frag_queue函数中. 看完之后发现逻辑基本没有大的变化. 这里就不过多介绍, 有兴趣的同学可以看一看相关的代码.

感受

了解过程与原理之后再看代码就会事半功倍, 印象也会更加深刻. 理论知识相当重要啊.

Comments