firecracker
by 伊布
AWS开源了一个新的虚拟化技术,叫做 firecracker.
Firecracker is an open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant container and function-based services that provide serverless operational models.
确实是解决了docker的痛点。不知道你有没有注意到,在docker里面查看 /proc/cpuinfo
、/proc/meminfo
,看到的都是宿主机的信息,一方面暴漏了主机的信息,另一方面对一些Java应用会造成误导,因为JVM默认堆内存是/proc/meminfo
的1/4(Ref),如果按宿主机的来设置,可能会很容易内存出错。
通常需要使用lxcfs来解决这个问题。
Firecracker处于VM和docker之间,称之为 microVM 。与docker不同,fc使用的是kvm,可以带来更好的安全、隔离,适应于多租户环境。AWS在lambda中使用的比较多。
来用一下。
Firecracker需要操作kvm,因此宿主机上必须安装kvm相关。
sudo apt-get install qemu-kvm
安装后会创建 /dev/kvm
文件。如果有报 Not Found,可以先安装下上面的包。
Firecracker还需要内核版本大于等于 4.14,如果内核版本低于4.14,则需要升级内核。ubuntu/deepin版本更新及时一般问题不大,如果是centos,则可以考虑使用社区版本的内核,目前centos7已经升级到4.18了。
设置 /dev/kvm
权限,允许当前用户读写。
sudo setfacl -m u:${USER}:rw /dev/kvm
启动 Firecracker服务器。它会启动一个http server,该socket通过/tmp/firecracker.sock
这个文件来访问。Firecracker的bin文件可以从release下载,当前最新是v0.11.0。
sudo rm -f /tmp/firecracker.sock
sudo ./firecracker-v0.11.0 --api-sock /tmp/firecracker.sock
Firecracker是microVM,参考kvm,需要kernel和rootfs。官方的指导文档是从s3上下载,不过s3貌似被墙了,我取了一份放到了腾讯云的对象存储里,你可以从腾讯云下载。打个广告,腾讯云免费提供50GB的存储,可惜只有10GB每月的流量。
下到同一个目录下。
wget https://silenceshell-1255345740.file.myqcloud.com/hello-vmlinux.bin
wget https://silenceshell-1255345740.file.myqcloud.com/hello-rootfs.ext4
再开一个terminal,使用curl命令来操作fc。
#!/bin/bash
sudo curl --unix-socket /tmp/firecracker.sock -i \
-X PUT 'http://localhost/boot-source' \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"kernel_image_path": "./hello-vmlinux.bin",
"boot_args": "console=ttyS0 reboot=k panic=1 pci=off"
}'
sudo curl --unix-socket /tmp/firecracker.sock -i \
-X PUT 'http://localhost/drives/rootfs' \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"drive_id": "rootfs",
"path_on_host": "./hello-rootfs.ext4",
"is_root_device": true,
"is_read_only": false
}'
sudo curl --unix-socket /tmp/firecracker.sock -i \
-X PUT 'http://localhost/actions' \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"action_type": "InstanceStart"
}'
回到之前的firecracker监听的terminal,可以看到已经进了 microVM ,用户名密码为 root/root 。
sudo ./firecracker-v0.11.0 --api-sock /tmp/firecracker.sock
[ 0.000000] Linux version 4.14.55-84.37.amzn2.x86_64 (mockbuild@ip-10-0-1-79) (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Wed Jul 25 18:47:15 UTC 2018
[ 0.000000] Command line: console=ttyS0 reboot=k panic=1 pci=off root=/dev/vda
virtio_mmio.device=4K@0xd0000000:5
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[ 0.000000] e820: BIOS-provided physical RAM map:
...
Welcome to Alpine Linux 3.8
Kernel 4.14.55-84.37.amzn2.x86_64 on an x86_64 (ttyS0)
localhost login: root
Password:
Welcome to Alpine!
The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org>.
You can setup the system with the command: setup-alpine
You may change this message by editing /etc/motd.
login[980]: root login on 'ttyS0'
localhost:~# df -h
Filesystem Size Used Available Use% Mounted on
/dev/root 28.0M 21.1M 4.9M 81% /
devtmpfs 10.0M 0 10.0M 0% /dev
tmpfs 11.2M 96.0K 11.1M 1% /run
shm 56.1M 0 56.1M 0% /dev/shm
localhost:~# ps aux
PID USER TIME COMMAND
1 root 0:00 {openrc-init} /sbin/init
2 root 0:00 [kthreadd]
3 root 0:00 [kworker/0:0]
默认microVM的资源是1核128MB内存,查看 /proc/cpuinfo
、/proc/meminfo
可以看到已经是正确的值了。
根文件系统的大小为28M,这个大小也就是 rootfs 的大小,firecracker是将 hello-rootfs.ext4
文件当做块设备来用的。可以在根分区下写点东西,关闭 microVM (通过reboot命令)后重新再启动 microVM ,可以发现之前写的文件还在。
Firecracker启动速度很快,很轻量,黑科技,有前景,非常值得仔细研究。
但是还有很多地方没弄明白,例如网络,文件系统,vCPU,与docker对比,与containerd交互,与kvm交互等等。
附一个启动日志,可以看到其启动过程与kvm很类似,但用了OpenRC来代替linux传统的init进程。
sudo ./firecracker-v0.11.0 --api-sock /tmp/firecracker.sock
^@^@^@^@^@[ 0.000000] Linux version 4.14.55-84.37.amzn2.x86_64 (mockbuild@ip-10-0-1-79) (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Wed Jul 25 18:47:15 UTC 2018
[ 0.000000] Command line: console=ttyS0 reboot=k panic=1 pci=off root=/dev/vda virtio_mmio.device=4K@0xd0000000:5
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x0000000007ffffff] usable
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI not present or invalid.
[ 0.000000] Hypervisor detected: KVM
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] e820: last_pfn = 0x8000 max_arch_pfn = 0x400000000
[ 0.000000] MTRR: Disabled
[ 0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[ 0.000000] CPU MTRRs all blank - virtualized system.
[ 0.000000] x86/PAT: Configuration [0-7]: WB WT UC- UC WB WT UC- UC
[ 0.000000] found SMP MP-table at [mem 0x0009fc00-0x0009fc0f] mapped at [ffffffffff200c00]
[ 0.000000] Scanning 1 areas for low memory corruption
[ 0.000000] Using GB pages for direct mapping
[ 0.000000] No NUMA configuration found
[ 0.000000] Faking a node at [mem 0x0000000000000000-0x0000000007ffffff]
[ 0.000000] NODE_DATA(0) allocated [mem 0x07fde000-0x07ffffff]
[ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
[ 0.000000] kvm-clock: cpu 0, msr 0:7fdc001, primary cpu clock
[ 0.000000] kvm-clock: using sched offset of 400616677122 cycles
[ 0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff]
[ 0.000000] DMA32 [mem 0x0000000001000000-0x0000000007ffffff]
[ 0.000000] Normal empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff]
[ 0.000000] node 0: [mem 0x0000000000100000-0x0000000007ffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x0000000007ffffff]
[ 0.000000] Intel MultiProcessor Specification v1.4
[ 0.000000] MPTABLE: OEM ID: FC
[ 0.000000] MPTABLE: Product ID: 000000000000
[ 0.000000] MPTABLE: APIC at: 0xFEE00000
[ 0.000000] Processor #0 (Bootup-CPU)
[ 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] Processors: 1
[ 0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x000fffff]
[ 0.000000] e820: [mem 0x08000000-0xffffffff] available for PCI devices
[ 0.000000] Booting paravirtualized kernel on KVM
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x486 with crng_init=0
[ 0.000000] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:1 nr_node_ids:1
[ 0.000000] percpu: Embedded 41 pages/cpu @ffff880007c00000 s128728 r8192 d31016 u2097152
[ 0.000000] KVM setup async PF for cpu 0
[ 0.000000] kvm-stealtime: cpu 0, msr 7c15040
[ 0.000000] PV qspinlock hash table entries: 256 (order: 0, 4096 bytes)
[ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 32137
[ 0.000000] Policy zone: DMA32
[ 0.000000] Kernel command line: console=ttyS0 reboot=k panic=1 pci=off root=/dev/vda virtio_mmio.device=4K@0xd0000000:5
[ 0.000000] PID hash table entries: 512 (order: 0, 4096 bytes)
[ 0.000000] Memory: 111064K/130680K available (8204K kernel code, 622K rwdata, 1464K rodata, 1268K init, 2820K bss, 19616K reserved, 0K cma-reserved)
[ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] Kernel/User page tables isolation: enabled
[ 0.004000] Hierarchical RCU implementation.
[ 0.004000] RCU restricting CPUs from NR_CPUS=128 to nr_cpu_ids=1.
[ 0.004000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[ 0.004000] NR_IRQS: 4352, nr_irqs: 48, preallocated irqs: 16
[ 0.004000] Console: colour dummy device 80x25
[ 0.004000] console [ttyS0] enabled
[ 0.004000] tsc: Detected 3292.374 MHz processor
[ 0.004000] Calibrating delay loop (skipped) preset value.. 6584.74 BogoMIPS (lpj=13169496)
[ 0.004000] pid_max: default: 32768 minimum: 301
[ 0.004000] Security Framework initialized
[ 0.004000] SELinux: Initializing.
[ 0.004000] Dentry cache hash table entries: 16384 (order: 5, 131072 bytes)
[ 0.004000] Inode-cache hash table entries: 8192 (order: 4, 65536 bytes)
[ 0.004000] Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
[ 0.004000] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes)
[ 0.004305] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024
[ 0.004782] Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4
[ 0.005304] Spectre V2 : Mitigation: Full generic retpoline
[ 0.005731] Spectre V2 : Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier
[ 0.006366] Spectre V2 : Enabling Restricted Speculation for firmware calls
[ 0.006897] Speculative Store Bypass: Vulnerable
[ 0.017865] Freeing SMP alternatives memory: 28K
[ 0.019147] smpboot: Max logical packages: 1
[ 0.019641] x2apic enabled
[ 0.020004] Switched APIC routing to physical x2apic.
[ 0.021133] ..TIMER: vector=0x30 apic1=0 pin1=0 apic2=-1 pin2=-1
[ 0.021657] smpboot: CPU0: Intel(R) Xeon(R) Processor @ 3.30GHz (family: 0x6, model: 0x3c, stepping: 0x3)
[ 0.022471] Performance Events: unsupported p6 CPU model 60 no PMU driver, software events only.
[ 0.023198] Hierarchical SRCU implementation.
[ 0.023817] smp: Bringing up secondary CPUs ...
[ 0.024000] smp: Brought up 1 node, 1 CPU
[ 0.024000] smpboot: Total of 1 processors activated (6584.74 BogoMIPS)
[ 0.024000] devtmpfs: initialized
[ 0.024000] x86/mm: Memory block size: 128MB
[ 0.024132] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 0.024868] futex hash table entries: 256 (order: 2, 16384 bytes)
[ 0.025448] NET: Registered protocol family 16
[ 0.025875] cpuidle: using governor ladder
[ 0.026166] cpuidle: using governor menu
[ 0.029273] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[ 0.029803] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[ 0.030604] dmi: Firmware registration failed.
[ 0.031027] NetLabel: Initializing
[ 0.031295] NetLabel: domain hash size = 128
[ 0.031640] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO
[ 0.032025] NetLabel: unlabeled traffic allowed by default
[ 0.032560] clocksource: Switched to clocksource kvm-clock
[ 0.033008] VFS: Disk quotas dquot_6.6.0
[ 0.033325] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 0.035661] NET: Registered protocol family 2
[ 0.035661] TCP established hash table entries: 1024 (order: 1, 8192 bytes)
[ 0.036171] TCP bind hash table entries: 1024 (order: 2, 16384 bytes)
[ 0.036680] TCP: Hash tables configured (established 1024 bind 1024)
[ 0.037219] UDP hash table entries: 256 (order: 1, 8192 bytes)
[ 0.037675] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes)
[ 0.038184] NET: Registered protocol family 1
[ 0.039042] virtio-mmio: Registering device virtio-mmio.0 at 0xd0000000-0xd0000fff, IRQ 5.
[ 0.039713] platform rtc_cmos: registered platform RTC device (no PNP device found)
[ 0.040738] Scanning for low memory corruption every 60 seconds
[ 0.041341] audit: initializing netlink subsys (disabled)
[ 0.041933] Initialise system trusted keyrings
[ 0.042285] Key type blacklist registered
[ 0.042642] audit: type=2000 audit(1543422234.398:1): state=initialized audit_enabled=0 res=1
[ 0.043343] workingset: timestamp_bits=36 max_order=15 bucket_order=0
[ 0.044790] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 0.047096] Key type asymmetric registered
[ 0.047423] Asymmetric key parser 'x509' registered
[ 0.047821] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
[ 0.048478] io scheduler noop registered (default)
[ 0.048891] io scheduler cfq registered
[ 0.049242] virtio-mmio virtio-mmio.0: Failed to enable 64-bit or 32-bit DMA. Trying to continue, but this might not work.
[ 0.050171] Serial: 8250/16550 driver, 1 ports, IRQ sharing disabled
[ 0.071606] serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a U6_16550A
[ 0.073622] loop: module loaded
[ 0.074310] tun: Universal TUN/TAP device driver, 1.6
[ 0.074776] hidraw: raw HID events driver (C) Jiri Kosina
[ 0.075240] nf_conntrack version 0.5.0 (1024 buckets, 4096 max)
[ 0.075780] ip_tables: (C) 2000-2006 Netfilter Core Team
[ 0.076244] Initializing XFRM netlink socket
[ 0.076634] NET: Registered protocol family 10
[ 0.077341] Segment Routing with IPv6
[ 0.077647] NET: Registered protocol family 17
[ 0.078001] Bridge firewalling registered
[ 0.078350] sched_clock: Marking stable (76197330, 0)->(119073617, -42876287)
[ 0.079048] registered taskstats version 1
[ 0.079376] Loading compiled-in X.509 certificates
[ 0.080370] Loaded X.509 cert 'Build time autogenerated kernel key: 3472798b31ba23b86c1c5c7236c9c91723ae5ee9'
[ 0.081151] zswap: default zpool zbud not available
[ 0.081524] zswap: pool creation failed
[ 0.081912] Key type encrypted registered
[ 0.083270] EXT4-fs (vda): recovery complete
[ 0.083638] EXT4-fs (vda): mounted filesystem with ordered data mode. Opts: (null)
[ 0.084251] VFS: Mounted root (ext4 filesystem) on device 254:0.
[ 0.084830] devtmpfs: mounted
[ 0.085606] Freeing unused kernel memory: 1268K
[ 0.092056] Write protecting the kernel read-only data: 12288k
[ 0.093373] Freeing unused kernel memory: 2016K
[ 0.094823] Freeing unused kernel memory: 584K
2018-11-29T00:23:54.488369418 [:WARN:vmm/src/lib.rs:903] Guest-boot-time = 168891 us 168 ms, 162411 CPU us 162 CPU ms
OpenRC init version 0.35.5.87b1ff59c1 starting
Starting sysinit runlevel
OpenRC 0.35.5.87b1ff59c1 is starting up Linux 4.14.55-84.37.amzn2.x86_64 (x86_64)
* Mounting /proc ...
[ ok ]
* Mounting /run ...
* /run/openrc: creating directory
* /run/lock: creating directory
* /run/lock: correcting owner
* Caching service dependencies ...
Service `hwdrivers' needs non existent service `dev'
[ ok ]
Starting boot runlevel
* Remounting devtmpfs on /dev ...
[ ok ]
* Mounting /dev/mqueue ...
[ ok ]
* Mounting /dev/pts ...
[ ok ]
* Mounting /dev/shm ...
[ ok ]
* Setting hostname ...
[ ok ]
* Checking local filesystems ...
[ ok ]
* Remounting filesystems ...
[ ok[ 0.202513] random: fast init done
]
* Mounting local filesystems ...
[ ok ]
* Loading modules ...
modprobe: can't change directory to '/lib/modules': No such file or directory
modprobe: can't change directory to '/lib/modules': No such file or directory
[ ok ]
* Mounting misc binary format filesystem ...
[ ok ]
* Mounting /sys ...
[ ok ]
* Mounting security filesystem ...
[ ok ]
* Mounting debug filesystem ...
[ ok ]
* Mounting SELinux filesystem ...
[ ok ]
* Mounting persistent storage (pstore) filesystem ...
[ ok ]
Starting default runlevel
* Starting networking ...
* eth0 ...
Device "eth0" does not exist.
ifconfig: eth0: error fetching interface information: Device not found
ifconfig: SIOCSIFADDR: No such device
run-parts: /etc/network/if-up.d/firecracker-tap: exit status 1 [ !! ]
* eth1 ...
Device "eth1" does not exist.
ifconfig: eth1: error fetching interface information: Device not found
ifconfig: SIOCSIFADDR: No such device
run-parts: /etc/network/if-up.d/firecracker-tap: exit status 1 [ !! ]
* eth2 ...
Device "eth2" does not exist.
ifconfig: eth2: error fetching interface information: Device not found
ifconfig: SIOCSIFADDR: No such device
run-parts: /etc/network/if-up.d/firecracker-tap: exit status 1 [ !! ]
* ERROR: networking failed to start
[ 0.311853] random: sshd: uninitialized urandom read (40 bytes read)
* Starting sshd.eth0 ...
[ 0.316682] random: sshd: uninitialized urandom read (40 bytes read) [ ok ]
[ 1.056063] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x2f75287227e, max_idle_ns: 440795361488 ns
Welcome to Alpine Linux 3.8
Kernel 4.14.55-84.37.amzn2.x86_64 on an x86_64 (ttyS0)
localhost login: root
Password:
Welcome to Alpine!
The Alpine Wiki contains a large amount of how-to guides and general
information about administrating Alpine systems.
See <http://wiki.alpinelinux.org>.
You can setup the system with the command: setup-alpine
You may change this message by editing /etc/motd.
login[980]: root login on 'ttyS0'
localhost:~# ps aux
PID USER TIME COMMAND
1 root 0:00 {openrc-init} /sbin/init
2 root 0:00 [kthreadd]
3 root 0:00 [kworker/0:0]
Subscribe via RSS