Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394), страница 17
Текст из файла (страница 17)
It returns true if the directory (and allparent directories) was (were) successfully created.Often, you don’t need to explicitly create a directory, because writing a file by callingcreate() will automatically create any parent directories.Querying the FilesystemFile metadata: FileStatusAn important feature of any filesystem is the ability to navigate its directory structureand retrieve information about the files and directories that it stores. The FileStatusclass encapsulates filesystem metadata for files and directories, including file length,block size, replication, modification time, ownership, and permission information.The method getFileStatus() on FileSystem provides a way of getting a FileStatusobject for a single file or directory.
Example 3-5 shows an example of its use.The Java Interface|63Example 3-5. Demonstrating file status informationpublic class ShowFileStatusTest {private MiniDFSCluster cluster; // use an in-process HDFS cluster for testingprivate FileSystem fs;@Beforepublic void setUp() throws IOException {Configuration conf = new Configuration();if (System.getProperty("test.build.data") == null) {System.setProperty("test.build.data", "/tmp");}cluster = new MiniDFSCluster.Builder(conf).build();fs = cluster.getFileSystem();OutputStream out = fs.create(new Path("/dir/file"));out.write("content".getBytes("UTF-8"));out.close();}@Afterpublic void tearDown() throws IOException {if (fs != null) { fs.close(); }if (cluster != null) { cluster.shutdown(); }}@Test(expected = FileNotFoundException.class)public void throwsFileNotFoundForNonExistentFile() throws IOException {fs.getFileStatus(new Path("no-such-file"));}@Testpublic void fileStatusForFile() throws IOException {Path file = new Path("/dir/file");FileStatus stat = fs.getFileStatus(file);assertThat(stat.getPath().toUri().getPath(), is("/dir/file"));assertThat(stat.isDirectory(), is(false));assertThat(stat.getLen(), is(7L));assertThat(stat.getModificationTime(),is(lessThanOrEqualTo(System.currentTimeMillis())));assertThat(stat.getReplication(), is((short) 1));assertThat(stat.getBlockSize(), is(128 * 1024 * 1024L));assertThat(stat.getOwner(), is(System.getProperty("user.name")));assertThat(stat.getGroup(), is("supergroup"));assertThat(stat.getPermission().toString(), is("rw-r--r--"));}@Testpublic void fileStatusForDirectory() throws IOException {Path dir = new Path("/dir");FileStatus stat = fs.getFileStatus(dir);assertThat(stat.getPath().toUri().getPath(), is("/dir"));assertThat(stat.isDirectory(), is(true));64|Chapter 3: The Hadoop Distributed FilesystemassertThat(stat.getLen(), is(0L));assertThat(stat.getModificationTime(),is(lessThanOrEqualTo(System.currentTimeMillis())));assertThat(stat.getReplication(), is((short) 0));assertThat(stat.getBlockSize(), is(0L));assertThat(stat.getOwner(), is(System.getProperty("user.name")));assertThat(stat.getGroup(), is("supergroup"));assertThat(stat.getPermission().toString(), is("rwxr-xr-x"));}}If no file or directory exists, a FileNotFoundException is thrown.
However, if you areinterested only in the existence of a file or directory, the exists() method onFileSystem is more convenient:public boolean exists(Path f) throws IOExceptionListing filesFinding information on a single file or directory is useful, but you also often need to beable to list the contents of a directory. That’s what FileSystem’s listStatus() methodsare for:public FileStatus[] listStatus(Path f) throws IOExceptionpublic FileStatus[] listStatus(Path f, PathFilter filter) throws IOExceptionpublic FileStatus[] listStatus(Path[] files) throws IOExceptionpublic FileStatus[] listStatus(Path[] files, PathFilter filter)throws IOExceptionWhen the argument is a file, the simplest variant returns an array of FileStatus objectsof length 1.
When the argument is a directory, it returns zero or more FileStatus objectsrepresenting the files and directories contained in the directory.Overloaded variants allow a PathFilter to be supplied to restrict the files and direc‐tories to match. You will see an example of this in the section “PathFilter” on page 67.Finally, if you specify an array of paths, the result is a shortcut for calling the equivalentsingle-path listStatus() method for each path in turn and accumulating theFileStatus object arrays in a single array. This can be useful for building up lists ofinput files to process from distinct parts of the filesystem tree.
Example 3-6 is a simpledemonstration of this idea. Note the use of stat2Paths() in Hadoop’s FileUtil forturning an array of FileStatus objects into an array of Path objects.Example 3-6. Showing the file statuses for a collection of paths in a Hadoop filesystempublic class ListStatus {public static void main(String[] args) throws Exception {String uri = args[0];Configuration conf = new Configuration();FileSystem fs = FileSystem.get(URI.create(uri), conf);The Java Interface|65Path[] paths = new Path[args.length];for (int i = 0; i < paths.length; i++) {paths[i] = new Path(args[i]);}FileStatus[] status = fs.listStatus(paths);Path[] listedPaths = FileUtil.stat2Paths(status);for (Path p : listedPaths) {System.out.println(p);}}}We can use this program to find the union of directory listings for a collection of paths:% hadoop ListStatus hdfs://localhost/ hdfs://localhost/user/tomhdfs://localhost/userhdfs://localhost/user/tom/bookshdfs://localhost/user/tom/quangle.txtFile patternsIt is a common requirement to process sets of files in a single operation.
For example,a MapReduce job for log processing might analyze a month’s worth of files containedin a number of directories. Rather than having to enumerate each file and directory tospecify the input, it is convenient to use wildcard characters to match multiple files witha single expression, an operation that is known as globbing. Hadoop provides twoFileSystem methods for processing globs:public FileStatus[] globStatus(Path pathPattern) throws IOExceptionpublic FileStatus[] globStatus(Path pathPattern, PathFilter filter)throws IOExceptionThe globStatus() methods return an array of FileStatus objects whose paths matchthe supplied pattern, sorted by path.
An optional PathFilter can be specified to restrictthe matches further.Hadoop supports the same set of glob characters as the Unix bash shell (see Table 3-2).Table 3-2. Glob characters and their meaningsGlobNameMatches*asteriskMatches zero or more characters?question markMatches a single character[ab]character classMatches a single character in the set {a, b}[^ab]negated character classMatches a single character that is not in the set {a, b}[a-b]character rangeMatches a single character in the (closed) range [a, b], where a is lexicographicallyless than or equal to b66|Chapter 3: The Hadoop Distributed FilesystemGlobNameMatches[^a-b] negated character range Matches a single character that is not in the (closed) range [a, b], where a islexicographically less than or equal to b{a,b}alternationMatches either expression a or b\cescaped characterMatches character c when it is a metacharacterImagine that logfiles are stored in a directory structure organized hierarchically bydate.
So, logfiles for the last day of 2007 would go in a directory named /2007/12/31, forexample. Suppose that the full file listing is:/├── 2007/│└── 12/│├──│└──└── 2008/└── 01/├──└──30/31/01/02/Here are some file globs and their expansions:GlobExpansion/*/2007 /2008/*/*/2007/12 /2008/01/*/12/*/2007/12/30 /2007/12/31/200?/2007 /2008/200[78]/2007 /2008/200[7-8]/2007 /2008/200[^01234569]/2007 /2008/*/*/{31,01}/2007/12/31 /2008/01/01/*/*/3{0,1}/2007/12/30 /2007/12/31/*/{12/31,01/01} /2007/12/31 /2008/01/01PathFilterGlob patterns are not always powerful enough to describe a set of files you want toaccess.
For example, it is not generally possible to exclude a particular file using a globpattern. The listStatus() and globStatus() methods of FileSystem take an optionalPathFilter, which allows programmatic control over matching:package org.apache.hadoop.fs;public interface PathFilter {boolean accept(Path path);}The Java Interface|67PathFilter is the equivalent of java.io.FileFilter for Path objects rather than Fileobjects.Example 3-7 shows a PathFilter for excluding paths that match a regular expression.Example 3-7. A PathFilter for excluding paths that match a regular expressionpublic class RegexExcludePathFilter implements PathFilter {private final String regex;public RegexExcludePathFilter(String regex) {this.regex = regex;}public boolean accept(Path path) {return !path.toString().matches(regex);}}The filter passes only those files that don’t match the regular expression.