snappy压缩

安装snappy

配置centos epel源
sudo yum install snappy

在MapReduce中使用

mapred-site.xml:

<property>
  <name>mapred.compress.map.output</name> 
  <value>true</value>
</property>
<property>
  <name>mapred.map.output.compression.codec</name> 
  <value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>

你也可以对每个job设置

如下参数对mapreduce的最终结果进行设置，可以对每个job设置

mapred.output.compress 是否对job最终的输出压缩(true or false)
mapred.output.compression.codec 对最终结果用什么算法进行压缩
mapred.output.compression.type 对于 SequenceFile的输出, 使用哪种压缩类型？(NONE, RECORD, or BLOCK). 推荐用BLOCK.

在Pig中使用

和mapreduce的设置的属性一样

在hive中使用

对hive输出的SequenceFile时进行压缩

SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

在Flume中使用

Depending on the architecture of the machine you are installing on, add one of the following lines to/usr/lib/flume/bin/flume-env.sh:

For 32-bit platforms:
export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32
For 64-bit platforms:
export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64
The following section explains how to take advantage of Snappy compression.

Using Snappy compression in Flume Sinks

You can specify Snappy as a compression codec in Flume's configuration language. For example, the following specifies a Snappy-compressed SequenceFile sink on HDFS:

customdfs("hdfs://namenode/path", seqfile("snappy"))

在Sqoop中使用

在命令行使用如下命令开启 Snappy compression:

–compression-codec org.apache.hadoop.io.compress.SnappyCodec

It is a good idea to use the –as-sequencefile option with this compression option.

在HBase中使用

You need to configure HBase to use Snappy only if you installed Hadoop and HBase from tarballs; if you installed them from RPM or Debian packages, Snappy requires no HBase configuration. Depending on the architecture of the machine you are installing on, add one of the following lines to/etc/hbase/conf/hbase-env.sh:

export HBASEPATH=/usr/lib/hadoop/lib/native/Linux-i386-32

export HBASEPATH=/usr/lib/hadoop/lib/native/Linux-amd64-64

To use Snappy compression in HBase Tables, specify the column family compression as snappy. For example, in the shell:

create 'mytable', {NAME⇒'mycolumnfamily:', COMPRESSION⇒'snappy'}

目录