Wednesday, February 17, 2016

Docker Container Networks: All Things Considered

Docker container networks have to achieve two seemingly conflicting goals:
  1. Provides complete isolation for containers on a host
  2. Provides the service that is running inside the container, not only to other co-located containers, but also to remote hosts.
In this article, we will review how Docker container networks achieve its goals.

Docker Container Networks


To provide service that is running inside the container in a secured matter, it is important to have control over the networks your applications running on. To see how container networks achieve that, we will examine the container networks from the following perspectives:
  • Network modes
    • Default networks vs user-defined networks
  • Packet Forwarding and Filtering (Netfilter)
    • Port mappings
  • Bridge (veth Interface)
  • DNS Configuration
To enable a service consumer to communicate with the service providing containers, Docker needs to configure the following entities:
  • IP address
    • Providing ways to configure any of the containers network interfaces to support services on different containers
  • Port
    • Providing ways to expose and publish a port on the container (also, mapping it to a port on the host)
  • Rules
    • Controlling access to a container's service via rules associated with the host's netfilter framework, in both the NAT and filter tables (see diagram 123).

    Network Modes


    When you install Docker, it creates three networks automatically:[34,37]
    1. bridge
      • Represents the docker0 (a virtual ethernet bridge) network present in all Docker installations.
        • Each container's network interface is attached to the bridge, and network address translation (NAT) is used when containers need to make themselves visible to the Docker host and beyond.
      • Unless you specify otherwise with the docker run --net=option, the Docker daemon connects containers to this network by default.
      • Docker does not support automatic service discovery on the default bridge network.
      • Supports the use of port mapping and docker run --link to allow communications between containers in the docker0 network.
    2. host
      • Adds a container on the hosts network stack. You’ll find the network configuration inside the container is identical to the host.
        • Because containers deployed in host mode share the same host network stack, you can’t use the same IP address for the same service on different containers on the same host.
        • In this mode, you don't get port mapping anymore.
    3. none
      • Tells docker to put the container in its own network stack but not to do configure any of the containers network interfaces.
      • This allows for you to create custom network configuration
    All these network modes applied at the container level. So you can certainly have a mix of different network modes on the same docker host.

    Default Networks vs User-Defined Networks

    Besides default networks, you can create your own user-defined networks that better isolate containers. Docker provides some default network drivers for creating these networks. The easiest user-defined network to create is a bridge network. This network is similar to the historical, default docker0 network. After you create the network, you can launch containers on it using the docker run --net= option. Within a user-defined bridge network, linking is not supported. You can expose and publish container ports on containers in this network. This is useful if you want to make a portion of the bridge network available to an outside network.

    You can read [34, 37] for more details.

    Packet Forwarding and Filtering


    Whether a container can talk to the world is governed by two factors.
    1. Whether the host machine is forwarding its IP packets
      • In order for a remote host to consume a container's service, the Docker host must act like a router, forwarding traffic to the network associated with the ethernet bridge.
      • IP packet forwarding is governed by the ip_forward system parameter in Docker
        • Many using Docker will want ip_forward to be on, to at least make communication possible between containers and the wider world.[39]
    2. Whether the host's iptables allow this particular connections[45]
      • Docker will never make changes to your host's iptables rules if you set --iptables=false when the daemon starts. Otherwise the Docker server will append forwarding rules to the DOCKER filter chain.
    Controlling access to a container's service is controlled with rules associated with the host's netfilter framework, in both the NAT and filter tables. A Docker host makes significant use of netfilter rules to aid NAT, and to control access to the containers it hosts.[44]

    Netfilter offers various functions and operations for packet filtering, network address translation, and port translation, which provide the functionality required for directing packets through a network, as well as for providing ability to prohibit packets from reaching sensitive locations within a computer network.

    Bridge (veth Interface)


    The default network mode in Docker is bridge. To create a virtual subnet shared between the host machine and every container in bridge mode, Docker bind every veth* interface to the docker0 bridge.

    To show information on the bridge and its attached ports (or interfaces), you do:

    # brctl show
    bridge name bridge id         STP enabled interfaces
    docker0     8000.56847afe9799 no          veth33957e0
                                              veth6cee79b


    To show veth interfaces on a host, you do:

    # ip link list
    3: docker0: mtu 9000 qdisc noqueue state UP link/ether 56:84:7a:fe:97:99 brd ff:ff:ff:ff:ff:ff
    11: veth33957e0: mtu 9000 qdisc noqueue master docker0 state UP link/ether 3e:01:d1:0f:24:b8 brd ff:ff:ff:ff:ff:ff
    13: veth6cee79b: mtu 9000 qdisc noqueue master docker0 state UP link/ether fa:aa:84:15:82:5a brd ff:ff:ff:ff:ff:ff


    Note that there are two containers on the host, hence two veth interfaces were shown. Those virtual interfaces work in pairs:
    • eth0 in the container 
      • Will have an IPv4 address 
      • For all purposes, it looks like a normal interface. 
    • veth interface in the host 
      • Won't have an IPv4 address
    Those two interfaces are connected together: any packet sent on an interface will appear as being received by the other. You can imagine that they are connected by a cross-over cable, if that helps.

    DNS Configuration


    How can Docker supply each container with a hostname and DNS configuration, without having to build a custom image with the hostname written inside? Its trick is to overlay three crucial /etc files inside the container with virtual files where it can write fresh information. You can see this by running mount inside a container:[29]

    # mount
    /dev/mapper/vg--docker-dockerVolume on /etc/resolv.conf type btrfs ...
    /dev/mapper/vg--docker-dockerVolume on /etc/hostname type btrfs ...
    /dev/mapper/vg--docker-dockerVolume on /etc/hosts type btrfs ...

    This arrangement allows Docker to do clever things like keep resolv.conf up to date across all containers when the host machine receives new configuration over DHCP later.

    With DHCP, computers request IP addresses and networking parameters automatically from a DHCP server, reducing the need for a network administrator or a user to configure these settings manually. For resource constrained routers and firewalls, dnsmasq is often used for its small-footprint. Dnsmasq provides network infrastructure for small networks: DNS, DHCP, router advertisement and network boot.

    References

    1. The TCP Maximum Segment Size and Related Topics
    2. Jumbo/Giant Frame Support on Catalyst Switches Configuration Example
    3. Ethernet Jumbo Frames\
    4. IP Fragmentation: How to Avoid It? (Xml and More)
    5. The Great Jumbo Frames Debate
    6. Resolve IP Fragmentation, MTU, MSS, and PMTUD Issues with GRE and IPSEC
    7. Sites with Broken/Working PMTUD
    8. Path MTU Discovery
    9. TCP headers
    10. bad TCP checksums
    11. MSS performance consideration
    12. Understanding Routing Table
    13. route (Linux man page)
    14. Docker should set host-side veth MTU #4378
    15. Add MTU to lxc conf to make host and container MTU match
    16. Xen Networking
    17. TCP parameter settings (/proc/sys/net/ipv4)
    18. Change the MTU of a network interface
      • tcp_base_mss, tcp_mtu_probing, etc
    19. MTU manipulation
    20. Jumbo Frames, the gotcha's you need to know! (good)
    21. Understand container communication (Docker)
    22. calicoctl should allow configuration of veth MTU #488 - GitHub
    23. Linux MTU Change Size
    24. Changing the MTU size in Windows Vista, 7 or 8
    25. Linux Configure Jumbo Frames to Boost Network Performance
    26. Path MTU discovery in practice
    27. 10 iptables rules to help secure your Linux box
    28. An Updated Performance Comparison of Virtual Machinesand Linux Containers
    29. Network Configuration (Docker)
    30. Storage Concepts in Docker: Persistent Storage
    31. Xen org
    32. CLOUD ARCHITECTURES,NETWORKS, SERVICES, ANDMANAGEMENT
    33. Cloud Networking
    34. Docker Networking 101 – Host mode
    35. Configuring DNS (Docker)
    36. Configuring dnsmasq to serve my own domain name zone
    37. Understand Docker container networks (Docker)
    38. dnsmasq - A lightweight DHCP and caching DNS server.
    39. Understand Container Communitcation
    40. Linux: Check if in Same Network
    41. Packet flow in Netfilter and General Networking (diagram)
    42. How to Enable IP Forwarding in Linux
    43. Exposing a port on a live docker container
    44. The docker-proxy
    45. Linux Firewall Tutorial: IPTables Tables, Chains, Rules Fundamentals
    46. iptables (ipset.netfilter.org)
    47. How to find out capacity for network interfaces?
    48. Security Considerations: Enabling/Disabling Ping /Traceroute for Your Network (Xml and More)
    49. How to Read a Traceroute (good)

    Monday, February 15, 2016

    Docker Container: How to Check Memory Size

    Inside a Docker container, the correct way to check its memory size is not using regular Linux commands such as:

    In this article, we will review how to discover runtime metrics inside a container and focus specifically on memory statistics.  Note that the docker version used in this discussion is 1.6.1.

    Misleading Metrics


    Using either "top" or "free" command, it will report the memory size of 7 GiB instead of 2 GiB (the correct answer) for our container.  Those commands don't know the existence of container and hence report the memory metrics of its host only.

    Docker Stats API


    One way to find out the correct memory statistics is to use the docker sub-command:
    For example, you can type:

    # docker ps
    CONTAINER ID  
    66f4084c6a36  

    #docker stats 66f4084c6a36
    CONTAINER         CPU %       MEM USAGE/LIMIT    MEM %       NET I/O
    66f4084c6a36      0.05%       257.1 MiB/2 GiB    12.55%      198.5 KiB/2.008 MiB


    From the above, we can find the max memory size of container is 2 GiB.  Note that the information returned by "top" or "free" command is retrieved from /proc/meminfo:

    # cat /proc/meminfo
    MemTotal:        7397060 kB

    cgroups (or Control Groups)


    As described in [3], Docker containers are built on top of cgroups.  For cgroups, runtime metrics are exposed through:
    • Newer builds
      • Control groups are exposed through a pseudo-filesystem named[4]
        • /sys/fs/cgroup
          • /sys/fs/cgroup/memory/docker/
    • Older builds
      • The control groups might be mounted on /cgroup, without distinct hierarchies.
      • To figure out where your control groups are mounted, you can run:
        • $ grep cgroup /proc/mounts
          • /cgroup/memory/docker/
    In either newer or older build, you need to first fetch the long-form container ID by typing:

    # docker ps --no-trunc
    CONTAINER ID  
    66f4084c6a3683cc2f41242e4d58a6381072ba64f41ce2e94b75c82099acd732

    In our system, we can find memory runtime metrics under the folder:

    • /cgroup/memory/docker/66f4084c6a3683cc2f41242e4d58a6381072ba64f41ce2e94b75c82099acd732

    For example, relevant memory runtime metrics can be found as follows:
    # cat memory.stat
    cache 84217856
    rss 186290176
    mapped_file 14630912
    swap 0
    pgpgin 83557
    pgpgout 17515
    pgfault 78655
    pgmajfault 43
    inactive_anon 0
    active_anon 186290176
    inactive_file 68890624
    active_file 15327232
    unevictable 0
    hierarchical_memory_limit 2147483648
    hierarchical_memsw_limit 4294967296
    total_cache 84217856
    total_rss 186290176

    In summary, cgroups allow Docker to
    • Group processes and manage their aggregate resource consumption 
    • Share available hardware resources to containers 
    • Limit the memory and CPU consumption of containers 
      • A container can be resized by simply changing the limits of its corresponding cgroup
      • You can gather information about resource usage in the container by examining Linux control groups in /sys/fs/cgroup
    • Provide a reliable way of terminating all processes inside a container.