s3cmdを使う

たまに使うのでs3cmdの備忘録です。
今回はAmazon Linux上で、s3cmd-1.5.0-beta1を使っています。

準備


まずはS3にアクセスできるユーザとS3のバケットを用意します。


S3にアクセスできる権限を持ったユーザを作っておくことにします。



s3cmdでバケットを作成することも可能ですが、今回は事前にバケットを作成しておきます。



ログもとるようにしておきました。








インストールと設定


インストールは簡単です。

# wget http://downloads.sourceforge.net/project/s3tools/s3cmd/1.5.0-beta1/s3cmd-1.5.0-beta1.tar.gz
# tar -xvzf s3cmd-1.5.0-beta1.tar.gz
# python setup.py install


次に設定ファイルを作成します。

$ s3cmd --configure

Enter new values or accept defaults in brackets with Enter.
Refer to user manual for detailed description of all options.

Access key and Secret key are your identifiers for Amazon S3
Access Key: <Your Access Key>
Secret Key: <Your Secret Key>

Encryption password is used to protect your files from reading
by unauthorized persons while in transfer to S3
Encryption password: <Your Encryption password>
Path to GPG program [/usr/bin/gpg]:

When using secure HTTPS protocol all communication with Amazon S3
servers is protected from 3rd party eavesdropping. This method is
slower than plain HTTP and can't be used if you're behind a proxy
Use HTTPS protocol [No]:

On some networks all internet access must go through a HTTP proxy.
Try setting it here if you can't conect to S3 directly
HTTP Proxy server name:

New settings:
  Access Key: <Your Access Key>
  Secret Key: <Your Secret Key>
  Encryption password: <Your Encrypption password>
  Path to GPG program: /usr/bin/gpg
  Use HTTPS protocol: False
  HTTP Proxy server name:
  HTTP Proxy server port: 0

Test access with supplied credentials? [Y/n] Y
Please wait, attempting to list all buckets...
Success. Your access key and secret key worked fine :-)

Now verifying that encryption works...
Success. Encryption and decryption worked fine :-)

Save settings? [y/N] y
Configuration saved to '/home/ec2-user/.s3cfg'

使ってみる(Linux版)

ファイルの操作は直感的にできます。

$ s3cmd ls s3://think-t/work/1M.img
2014-05-25 01:23   1024000   s3://think-t/work/1M.img

$ s3cmd put 1M.img s3://think-t/work/
1M.img -> s3://think-t/work/1M.img  [1 of 1]
 1024000 of 1024000   100% in    0s     5.01 MB/s  done

$ s3cmd get s3://think-t/work/1M.img 1M.img
s3://think-t/work/1M.img -> 1M.img  [1 of 1]
 1024000 of 1024000   100% in    0s     6.05 MB/s  done

$ s3cmd del s3://think-t/work/1M.img
File s3://think-t/work/1M.img deleted


rsyncのような使い方も。

$ s3cmd sync work s3://think-t/
work/tmp1/1M.img -> s3://think-t/work/tmp1/1M.img  [1 of 9]
 1024000 of 1024000   100% in    0s     6.13 MB/s  done
work/tmp1/2M.img -> s3://think-t/work/tmp1/2M.img  [2 of 9]
 2048000 of 2048000   100% in    0s    10.40 MB/s  done
work/tmp1/5M.img -> s3://think-t/work/tmp1/5M.img  [3 of 9]
 5120000 of 5120000   100% in    0s    13.51 MB/s  done
work/tmp2/10M.img -> s3://think-t/work/tmp2/10M.img  [4 of 9]
 10240000 of 10240000   100% in    0s    18.02 MB/s  done
work/tmp2/20M.img -> s3://think-t/work/tmp2/20M.img  [part 1 of 2, 15MB]
 15728640 of 15728640   100% in    0s    22.55 MB/s  done
work/tmp2/20M.img -> s3://think-t/work/tmp2/20M.img  [part 2 of 2, 4MB]
 4751360 of 4751360   100% in    0s    18.15 MB/s  done
work/tmp2/50M.img -> s3://think-t/work/tmp2/50M.img  [part 1 of 4, 15MB]
 15728640 of 15728640   100% in    0s    20.92 MB/s  done
(略)
work/tmp2/50M.img -> s3://think-t/work/tmp2/50M.img  [part 4 of 4, 3MB]
 4014080 of 4014080   100% in    0s    18.28 MB/s  done
work/tmp3/100M.img -> s3://think-t/work/tmp3/100M.img  [part 1 of 7, 15MB]
 15728640 of 15728640   100% in    0s    23.07 MB/s  done
(略)
work/tmp3/100M.img -> s3://think-t/work/tmp3/100M.img  [part 7 of 7, 7MB]
 8028160 of 8028160   100% in    0s     8.07 MB/s  done
work/tmp3/200M.img -> s3://think-t/work/tmp3/200M.img  [part 1 of 14, 15MB]
 15728640 of 15728640   100% in    1s     9.13 MB/s  done
(略)
work/tmp3/200M.img -> s3://think-t/work/tmp3/200M.img  [part 14 of 14, 320kB]
 327680 of 327680   100% in    0s     4.67 MB/s  done
work/tmp3/500M.img -> s3://think-t/work/tmp3/500M.img  [part 1 of 33, 15MB]
 15728640 of 15728640   100% in    1s     8.82 MB/s  done
(略)
work/tmp3/500M.img -> s3://think-t/work/tmp3/500M.img  [part 33 of 33, 8MB]
 8683520 of 8683520   100% in    1s     5.42 MB/s  done
Process files that was not remote copied
Done. Uploaded 909312000 bytes in 95.0 seconds, 9.12 MB/s.  Copied 0 files saving 0 bytes transfer.


デフォルトだと15MB以上のファイルは自動で分割されます。
(.s3cfgで「enable_multipart = True」「multipart_chunk_size_mb = 15」)

$ s3cmd put 6G.img s3://think-t/work/
6G.img -> s3://think-t/work/6G.img  [part 1 of 391, 15MB]
 15728640 of 15728640   100% in    1s    12.16 MB/s  done
6G.img -> s3://think-t/work/6G.img  [part 2 of 391, 15MB]
 15728640 of 15728640   100% in    1s     8.11 MB/s  done
6G.img -> s3://think-t/work/6G.img  [part 3 of 391, 15MB]
(略)
6G.img -> s3://think-t/work/6G.img  [part 391 of 391, 9MB]
 9830400 of 9830400   100% in    1s     7.38 MB/s  done


S3の場合はリクエストに課金が発生するので、大きいファイルを処理する場合は、
「--multipart-chunk-size-mb」オプションを付けるか、.s3cfgの設定を見直して、
分割サイズをコントロールするのが良さそうです。

$ s3cmd put --multipart-chunk-size-mb=5120 6G.img s3://think-t/work/
6G.img -> s3://think-t/work/6G.img  [part 1 of 2, 5GB]
 5368709120 of 5368709120   100% in  624s     8.20 MB/s  done
6G.img -> s3://think-t/work/6G.img  [part 2 of 2, 739MB]
 775290880 of 775290880   100% in   90s     8.14 MB/s  done


infoオプションでファイルの情報をより詳細に取得できます。

$ s3cmd info s3://think-t/work/1M.img
s3://think-t/work/1M.img (object):
   File size: 1024000
   Last mod:  Sun, 25 May 2014 01:29:15 GMT
   MIME type: application/octet-stream; charset=binary
   MD5 sum:   80ec129d645c70cf0de45b1a5a682235
   SSE:       NONE
   policy: none
   ACL:       ----: FULL_CONTROL


ハッシュ値だけなら、「--list-md5」オプションを付ければOK

$ s3cmd ls --list-md5  s3://think-t/work/1M.img
2014-05-25 01:29   1024000   80ec129d645c70cf0de45b1a5a682235  s3://think-t/work/1M.img


全てのディレクトリを表示します。

$ s3cmd la --bucket-location=US s://think-t/work/
                       DIR   s3://think-t/logs/
                       DIR   s3://think-t/work/


バケットのサイズを表示します。

$ s3cmd du s3://think-t/
1166628  s3://think-t/


やっぱり便利です。
今日はこんなところで。