0b0000:起因
最近在琢磨块链条上的东西,打算在自己的阿里云上也折腾折腾;上来就遇到了一个麻烦:磁盘IO有点小,同步速度感觉不太友善,于是就想琢磨琢磨怎么加速。
打开阿里的云盘购买界面,80G的SSD,1小时价格0.112,一年下来980多,顿时就对这个看钱的世界绝望了;冷静分析了一番,最后决定买2个40G的”高效云盘”来加做个RAID0试试(2个40G高效云盘的价格是0.04每小时,折算下来每年大概350来块,瞬间感觉自己节约了好几个亿)。
0b0001:动手开干,创建RAID设备
购买了云盘之后将其挂载到VPS上,让我们先来验证下盘是否正常,运行 ls -l /dev | grep vd ,输出
1 2 3 4 5 |
lrwxrwxrwx 1 root root 3 Oct 4 02:22 dvd -> sr0 brw-rw---- 1 root disk 254, 0 Oct 4 02:22 vda brw-rw---- 1 root disk 254, 1 Oct 4 02:22 vda1 brw-rw---- 1 root disk 254, 16 Oct 4 02:25 vdb #这俩就是刚刚挂载的云盘 brw-rw---- 1 root disk 254, 32 Oct 4 02:25 vdc #这俩就是刚刚挂载的云盘 |
然后运行 mdadm --examine /dev/vd[b-c]来测试下(如果提示命令不存在,先 yum install mdadm或 apt-get install mdadm安装该工具),此时应该输出
1 2 |
mdadm: No md superblock detected on /dev/vdb. mdadm: No md superblock detected on /dev/vdc. |
紧接给这俩盘创建逻辑分区,运行 fdisk /dev/vdb,按下面步操作
- 输入“n”,创建一个新的分区
- 输入“p”,代表创建的是基本分区
- 接下来会询问分区编号和起止扇区,默认按回车即可
- 输入“p”来查看已创建的分区,确保下面的列表已经出现“/dev/vdb1”
- 输入“t”,设置分区的类型
- 输入“fd”,代表将该分区设置为’Linux raid autodetect’类型
- 输入“w”,保存更改
当然,vdc也要进行相同的操作
这时再运行 mdadm --examine /dev/vd[b-c]此时应该输出类似信息:
1 2 3 4 5 6 |
/dev/vdb: MBR Magic : aa55 Partition[0] : 83884032 sectors at 2048 (type fd) /dev/vdc: MBR Magic : aa55 Partition[0] : 83884032 sectors at 2048 (type fd) |
然后运行 mdadm --create /dev/md0 --level=stripe --raid-devices=2 /dev/vd[b-c]1或者 mdadm -C /dev/md0 -l raid0 -n 2 /dev/vd[b-c]1 此时应该输出类似信息:
1 2 |
mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. |
此时 cat /proc/mdstat的话已经可以看到相应的raid信息:
1 2 3 |
Personalities : [raid0] md0 : active raid0 vdc1[1] vdb1[0] 83818496 blocks super 1.2 512k chunks |
如果再保险点的话,分别再运行一下 mdadm -E /dev/vd[b-c]1和 mdadm --detail /dev/md0来再次确认相关信息
确认Ok后运行 mdadm -E -s -v >> /etc/mdadm.conf将RAID配置保存下来,接着运行 update-initramfs -u(如果你的系统是Arch,则换成 mkinitcpio -p /etc/mkinitcpio.d/somekernel.preset)和 update-grub添加initrd/initramfs的hook
0b0010:格式化并挂载分区
此时的/dev/md0已经创建完成,但是还需要格式化并挂载才能使用
- 运行
mkfs.ext4 /dev/md0将其格式化成ext4格式:
1234567891011mke2fs 1.43.4 (31-Jan-2017)Creating filesystem with 20954624 4k blocks and 5242880 inodesFilesystem UUID: 6e458e01-b60f-4f64-8a48-c31d0fbd350eSuperblock backups stored on blocks:32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,4096000, 7962624, 11239424, 20480000Allocating group tables: doneWriting inode tables: doneCreating journal (131072 blocks): doneWriting superblocks and filesystem accounting information: done
- 接着 cp /etc/fstab /etc/fstab.bak备份一下挂载点文件
- 然后 echo '/dev/md0 /home ext4 barrier=0 0 0' >> /etc/fstab写入新分区信息,表示将其挂载到/home,当然你也可以将其挂载到其他地方
- 最后 mount -a重新挂载分区
0b0011:验证下效果
分别在不同的地方运行一下 fio -randrepeat=1 -ioengine=libaio -direct=1 -gtod_reduce=1 -name=test -filename=test -bs=4k -iodepth=64 -size=4G -readwrite=randrw -rwmixread=75测试4k读写
- 双高效云盘RAID0的阿里云(3862读1235写,由于是debian系统,输出格式可能和centos下的不太一样):
1234567891011121314151617181920212223test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64fio-2.16Starting 1 processtest: Laying out IO file(s) (1 file(s) / 4096MB)Jobs: 1 (f=1): [m(1)] [100.0% done] [15448KB/4940KB/0KB /s] [3862/1235/0 iops] [eta 00m:00s]test: (groupid=0, jobs=1): err= 0: pid=10826: Thu Oct 4 02:58:47 2018read : io=3070.4MB, bw=16957KB/s, iops=4239, runt=185406msecwrite: io=1025.8MB, bw=5664.1KB/s, iops=1416, runt=185406mseccpu : usr=1.26%, sys=6.08%, ctx=286721, majf=0, minf=9IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%issued : total=r=785996/w=262580/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0latency : target=0, window=0, percentile=100.00%, depth=64Run status group 0 (all jobs):READ: io=3070.4MB, aggrb=16957KB/s, minb=16957KB/s, maxb=16957KB/s, mint=185406msec, maxt=185406msecWRITE: io=1025.8MB, aggrb=5664KB/s, minb=5664KB/s, maxb=5664KB/s, mint=185406msec, maxt=185406msecDisk stats (read/write):md0: ios=785569/262576, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=392998/131332, aggrmerge=0/36, aggrticks=5676188/234814, aggrin_queue=5911210, aggrutil=90.91%vdc: ios=393309/131035, merge=0/28, ticks=5435644/232016, in_queue=5667900, util=89.94%vdb: ios=392687/131630, merge=0/45, ticks=5916732/237612, in_queue=6154520, util=90.91%
- 默认的高效云盘阿里云(1971读646写,看起来也差不多是上面的一半):
12345678910111213141516171819202122232425test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64fio-3.1Starting 1 processtest: Laying out IO file (1 file / 4096MiB)Jobs: 1 (f=1): [m(1)][100.0%][r=7887KiB/s,w=2586KiB/s][r=1971,w=646 IOPS][eta 00m:00s] #阿里云在购买界面给出的20G大小的参考值是1960 IOPS,诚不我欺test: (groupid=0, jobs=1): err= 0: pid=8814: Thu Oct 4 03:03:39 2018read: IOPS=1965, BW=7863KiB/s (8052kB/s)(3070MiB/399800msec)bw ( KiB/s): min= 4488, max= 8682, per=100.00%, avg=7862.77, stdev=256.00, samples=799iops : min= 1122, max= 2170, avg=1965.68, stdev=64.03, samples=799write: IOPS=656, BW=2628KiB/s (2691kB/s)(1026MiB/399800msec)bw ( KiB/s): min= 1688, max= 3168, per=100.00%, avg=2627.91, stdev=174.95, samples=799iops : min= 422, max= 792, avg=656.96, stdev=43.73, samples=799cpu : usr=0.83%, sys=2.89%, ctx=365355, majf=0, minf=23IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%issued rwt: total=785920,262656,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=64Run status group 0 (all jobs):READ: bw=7863KiB/s (8052kB/s), 7863KiB/s-7863KiB/s (8052kB/s-8052kB/s), io=3070MiB (3219MB), run=399800-399800msecWRITE: bw=2628KiB/s (2691kB/s), 2628KiB/s-2628KiB/s (2691kB/s-2691kB/s), io=1026MiB (1076MB), run=399800-399800msecDisk stats (read/write):vda: ios=787413/263981, merge=173/1680, ticks=24796074/718949, in_queue=25519686, util=99.93%
- 阿里云“企业入门级”SSD实例(3706读1255写):
12345678910111213141516171819202122232425test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64fio-3.1Starting 1 processtest: Laying out IO file (1 file / 4096MiB)Jobs: 1 (f=1): [m(1)][100.0%][r=14.5MiB/s,w=5020KiB/s][r=3706,w=1255 IOPS][eta 00m:00s]test: (groupid=0, jobs=1): err= 0: pid=4218: Thu Oct 4 04:58:11 2018read: IOPS=3601, BW=14.1MiB/s (14.8MB/s)(3070MiB/218228msec)bw ( KiB/s): min= 5808, max=17261, per=100.00%, avg=14436.27, stdev=920.02, samples=435iops : min= 1452, max= 4315, avg=3609.03, stdev=230.00, samples=435write: IOPS=1203, BW=4814KiB/s (4930kB/s)(1026MiB/218228msec)bw ( KiB/s): min= 1808, max= 5876, per=100.00%, avg=4824.62, stdev=344.82, samples=435iops : min= 452, max= 1469, avg=1206.12, stdev=86.21, samples=435cpu : usr=1.22%, sys=4.00%, ctx=160750, majf=0, minf=23IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%issued rwt: total=785920,262656,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=64Run status group 0 (all jobs):READ: bw=14.1MiB/s (14.8MB/s), 14.1MiB/s-14.1MiB/s (14.8MB/s-14.8MB/s), io=3070MiB (3219MB), run=218228-218228msecWRITE: bw=4814KiB/s (4930kB/s), 4814KiB/s-4814KiB/s (4930kB/s-4930kB/s), io=1026MiB (1076MB), run=218228-218228msecDisk stats (read/write):vda: ios=785183/262515, merge=0/107, ticks=10244671/3660942, in_queue=13905813, util=99.99%
- 阿里云“本地SSD”实例,软RAID0(3W读1W写):
123456789101112131415161718192021222324252627test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64fio-3.1Starting 1 processtest: Laying out IO file (1 file / 4096MiB)Jobs: 1 (f=1): [m(1)][100.0%][r=129MiB/s,w=42.4MiB/s][r=33.1k,w=10.9k IOPS][eta 00m:00s]test: (groupid=0, jobs=1): err= 0: pid=29506: Tue Jan 29 10:09:21 2019read: IOPS=34.0k, BW=133MiB/s (139MB/s)(3070MiB/23108msec)bw ( KiB/s): min=130288, max=137616, per=100.00%, avg=136045.74, stdev=1311.53, samples=46iops : min=32572, max=34404, avg=34011.48, stdev=327.95, samples=46write: IOPS=11.4k, BW=44.4MiB/s (46.6MB/s)(1026MiB/23108msec)bw ( KiB/s): min=43072, max=47280, per=100.00%, avg=45469.39, stdev=874.23, samples=46iops : min=10768, max=11820, avg=11367.33, stdev=218.57, samples=46cpu : usr=5.66%, sys=19.66%, ctx=7534, majf=0, minf=8IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%issued rwt: total=785920,262656,0, short=0,0,0, dropped=0,0,0latency : target=0, window=0, percentile=100.00%, depth=64Run status group 0 (all jobs):READ: bw=133MiB/s (139MB/s), 133MiB/s-133MiB/s (139MB/s-139MB/s), io=3070MiB (3219MB), run=23108-23108msecWRITE: bw=44.4MiB/s (46.6MB/s), 44.4MiB/s-44.4MiB/s (46.6MB/s-46.6MB/s), io=1026MiB (1076MB), run=23108-23108msecDisk stats (read/write):md0: ios=779768/265309, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=392960/133444, aggrmerge=0/271, aggrticks=622454/1578, aggrin_queue=589912, aggrutil=81.70%vdc: ios=392952/133448, merge=0/0, ticks=575400/1612, in_queue=548200, util=75.15%vdb: ios=392968/133440, merge=0/543, ticks=669508/1544, in_queue=631624, util=81.70%
- 我自己本地的测试机(12900读4247写):
1234567891011121314151617181920212223242526test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64fio-3.3Starting 1 processtest: Laying out IO file (1 file / 4096MiB)Jobs: 1 (f=1): [m(1)][100.0%][r=50.5MiB/s,w=16.6MiB/s][r=12.9k,w=4247 IOPS][eta 00m:00s]test: (groupid=0, jobs=1): err= 0: pid=10971: Thu Oct 4 04:41:27 2018read: IOPS=12.3k, BW=48.2MiB/s (50.6MB/s)(3070MiB/63671msec)bw ( KiB/s): min=28248, max=57680, per=99.94%, avg=49341.31, stdev=4138.57, samples=127iops : min= 7062, max=14420, avg=12335.32, stdev=1034.64, samples=127write: IOPS=4125, BW=16.1MiB/s (16.9MB/s)(1026MiB/63671msec)bw ( KiB/s): min= 9635, max=19128, per=99.95%, avg=16491.75, stdev=1361.83, samples=127iops : min= 2408, max= 4782, avg=4122.91, stdev=340.48, samples=127cpu : usr=6.75%, sys=24.72%, ctx=394816, majf=0, minf=135IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%issued rwts: total=785920,262656,0,0 short=0,0,0,0 dropped=0,0,0,0latency : target=0, window=0, percentile=100.00%, depth=64Run status group 0 (all jobs):READ: bw=48.2MiB/s (50.6MB/s), 48.2MiB/s-48.2MiB/s (50.6MB/s-50.6MB/s), io=3070MiB (3219MB), run=63671-63671msecWRITE: bw=16.1MiB/s (16.9MB/s), 16.1MiB/s-16.1MiB/s (16.9MB/s-16.9MB/s), io=1026MiB (1076MB), run=63671-63671msecDisk stats (read/write):dm-0: ios=787183/262248, merge=0/0, ticks=3056810/970151, in_queue=4028413, util=99.97%, aggrios=785350/262402, aggrmerge=3793/484, aggrticks=3052880/970600, aggrin_queue=4023576, aggrutil=99.94%sda: ios=785350/262402, merge=3793/484, ticks=3052880/970600, in_queue=4023576, util=99.94%
- 至于Amazon AWS、Vultr跟DO不知为何跑不起来;Sakura的入门机型则是6046读1997写。(我开始编辑这篇文章的时候,以大坊开始同步,写道这里的时候同步已经完成,偶也!)
0b0100:结论
- 至少在阿里云上面,2个盘的RAID0的确可以有效加快速度,相比SSD云盘,用2个一半容量的“高效云盘”做软RAID的确可以达到类似性能,Instagram也干过类似的事
- 从价格上来说,阿里云的SSD云盘差不多比“高效云盘”贵了一倍左右,如果手上的确比较紧张,可以考虑这个方案
- RAID0本身是非常不安全的,任何一个节点出问题整个阵列就挂,不过根据阿里云官方的说法,云盘“高性能、持久性、高可靠”,所以我认为可以先观察观察