Hadoop 序列化,
运用hadoop的序列化在hadoop的框架中要使一个类可序列化,要实现Writable接口的两个方法:
Java代码
- public interface Writable {
- void write(DataOutput out) throws IOException;
- void readFields(DataInput in) throws IOException;
- }
比java的实现Serializable复杂很多。但是通过比较可以发现,hadoop的序列化机制产生的数据量远小于java的序列化所产生的数据量。
在这两个方法中自己控制对fileds的输入和输出。如果类中包含有其他对象的引用,那么那个对象也是要实现Writable接口的(当然也可以不实现Writable借口,只要自己处理好对对象的fileds的存贮就可以了)。
下面是一个简单的例子:
类Attribute
Java代码
- package siat.miner.etl.instance
- import java.io.DataInput;
- import java.io.DataOutput;
- import java.io.IOException;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.io.Writable;
- public class Attribute implements Writable{
- public static int ATTRIBUTE_TYPE_STRING = 1;//string type
- public static int ATTRIBUTE_TYPE_NOMINAL = 2;//nominal type
- public static int ATTRIBUTE_TYPE_REAL = 3;//real type
- private IntWritable type;
- private Text name;
- public IntWritable getType() {
- return type;
- }
- public void setType(int type) {
- this.type = new IntWritable(type);
- }
- public Text getName() {
- return name;
- }
- public void setName(String name) {
- this.name = new Text(name);
- }
- public Attribute() {
- super();
- this.type = new IntWritable(0);
- this.name = new Text("");
- }
- public Attribute(int type, String name) {
- super();
- this.type = new IntWritable(type);
- this.name = new Text(name);
- }
- @Override
- public void readFields(DataInput in) throws IOException {
- // TODO Auto-generated method stub
- type.readFields(in);
- name.readFields(in);
- }
- @Override
- public void write(DataOutput out) throws IOException {
- // TODO Auto-generated method stub
- type.write(out);
- name.write(out);
- }
- }
类TestA:
Java代码
- package siat.miner.etl.test;
- import java.io.ByteArrayInputStream;
- import java.io.ByteArrayOutputStream;
- import java.io.DataInput;
- import java.io.DataInputStream;
- import java.io.DataOutput;
- import java.io.DataOutputStream;
- import java.io.IOException;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.Writable;
- import siat.miner.etl.instance.Attribute;
- public class TestA implements Writable{
- private Attribute a;
- private IntWritable b;
- /**
- * @param args
- * @throws IOException
- */
- public static void main(String[] args) throws IOException {
- // TODO Auto-generated method stub
- Attribute a = new Attribute(Attribute.ATTRIBUTE_TYPE_NOMINAL, "name");
- TestA ta = new TestA(a, new IntWritable(1));
- ByteArrayOutputStream bos = new ByteArrayOutputStream();
- DataOutputStream oos = new DataOutputStream(bos);
- ta.write(oos);
- TestA tb = new TestA();
- tb.readFields(new DataInputStream(new ByteArrayInputStream(bos.toByteArray())));
- }
- public TestA(Attribute a, IntWritable b) {
- super();
- this.a = a;
- this.b = b;
- }
- public TestA() {
- // TODO Auto-generated constructor stub
- }
- @Override
- public void readFields(DataInput in) throws IOException {
- // TODO Auto-generated method stub
- a = new Attribute();
- a.readFields(in);
- b = new IntWritable();
- b.readFields(in);
- }
- @Override
- public void write(DataOutput out) throws IOException {
- // TODO Auto-generated method stub
- a.write(out);
- b.write(out);
- }
- }
可以看到,hadoop的序列化机制就是利用java的DataInput和DataOutput来完成对基本类型的序列化,然后让用户自己来处理对自己编写的类的序列化。
本站文章为和通数据库网友分享或者投稿,欢迎任何形式的转载,但请务必注明出处.
同时文章内容如有侵犯了您的权益,请联系QQ:970679559,我们会在尽快处理。