hadoop文件系统架构分析，hadoop系统架构分析

和通数据库htsjk.Com2019-08-26 23:47 来源:未知阅读:5097 评论 205 热度5

标签：

hadoop文件系统架构分析，hadoop系统架构分析

(软件体系结构的大作业，阅读分析hadoop文件系统)

写在前面/后面

FileSystem = abstractFileSystem -》面向文件系统实现者

FileContext -》面向应用程序编写者

FS中具体的文件系统作为最基层的类，用于实现对对应的文件系统的读取，同一继承自FileSystem/AbstarctFileSystem，HDFS是单独的一套系统，对等于VFS，用于操作不同的文件系统。HDFS虽然继承于AbstractFileSystem，主体又DFSClient类实现，（借尸还魂），只是为了做一个文件系统的样子，内在的实现在另外一个HDFS的文件夹里（有单独的DFSInputStream），独立于fs文件夹

FileSystem的结构：

extends Config(提供访问配置文件的功能)

文件创建，读取，重命名，拷贝，删除等操作

GlobFilter

description:A class that could decide if a string matches the glob or not

Cache:

Caching FileSystem objects,内含有hashMap

Statistics：

statisticsTable是一个IdentityHashMap

BlockLocation:

包含一个文件块的详细信息（全部副本，主机端口，网络拓扑结构，块文件长度，偏移，块长度等）

java string[] {副本主机}； string[] {副本主机端口}; string[] {主机网络中的拓扑路径};

FileStatus:

extends PathFilter，用来过滤文件，或的想要的文件。

获取文件状态，path.length,isdir,block_replocation等

Trash：

垃圾桶的功能

fs有用文章：

http://huashuizhuhui.iteye.com/blog/1867511

http://huashuizhuhui.iteye.com/category/209973

其他类：

DFSClinet:

DFSClient can connect to a Hadoop FileSystem and perform basic file tasks.It uses the ClientProtocol to commuicate with a NameNode daemon,and connects directly to DataNodes to read/write block data.HDFS users should obtain an instance of DistributedFileSystem,whith uses DFSClient to handle fileSystem tasks.

要求

文件系统的创建流程：

Filesystem.get() 
public static FileSystem get(URI uri, Configuration conf) throws IOException {
    String scheme = uri.getScheme();
    String authority = uri.getAuthority();

    if (scheme == null) {                       // no scheme:不知道是什么文件系统
      return get(conf);
    }

    if (authority == null) {                       // no authority
      URI defaultUri = getDefaultUri(conf);       //如果没有权限，从默认配置里获取URI
      if (scheme.equals(defaultUri.getScheme())    // if scheme matches default
          && defaultUri.getAuthority() != null) {  // & 有解码权限（不爽很懂）
        return get(defaultUri, conf);              // 返回默认的参数
      }
    }

    String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme);
    //是否可以从cache里获取
    if (conf.getBoolean(disableCacheName, false)) {
      return createFileSystem(uri, conf);
    }

    return CACHE.get(uri, conf);
  }
//FileSystem.createFileSystem()
  private static FileSystem createFileSystem(URI uri, Configuration conf
      ) throws IOException {
  Class<?> clazz = conf.getClass("fs." + uri.getScheme() + ".impl", null);  //really create filesystem，获取对应文件系统的名称
if (clazz == null) {
  throw new IOException("No FileSystem for scheme: " + uri.getScheme());
}
FileSystem fs = (FileSystem)ReflectionUtils.newInstance(clazz, conf);//用反射的机制去构造类，提供类的位置，找文件找类
fs.initialize(uri, conf);
return fs;
  }
     //Cache.get()

   获取key（）-》scheme,authority,username标识

   private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{

    FileSystem fs;
     synchronized (this) {
       fs = map.get(key);
     }
     if (fs != null) {
       return fs;
     }

     fs = createFileSystem(uri, conf);
 //以下涉及同步问题
     synchronized (this) { // refetch the lock again
       FileSystem oldfs = map.get(key);
       if (oldfs != null) { // 一样的系统被创建当这个fs在创建的时候，比你早一步
         fs.close(); // close the new file system
         return oldfs;  // return the old file system
       }

       // now insert the new file system into the map
       if (map.isEmpty() && !clientFinalizer.isAlive()) {
         Runtime.getRuntime().addShutdownHook(clientFinalizer);
       }
       fs.key = key;
       map.put(key, fs);   //加入map中
       if (conf.getBoolean("fs.automatic.close", true)) {
         toAutoClose.add(key);       //自动关闭的链表，如果需要自动关闭
       }
       return fs;
     }
   }

这里的文件系统只是本地的文件系统，HDFS的专门处理分布式的文件系统，继承于abstractFileSystem的目的是为了接口通信。在完整的源码中，有hadoop-common-project /hadoop-hdfs-project,在这个版本里hdfs放在hdfs-project里，而不是common里，推测老师给的版本应该是过渡版本