API Reference¶
User Functions¶
fsspec.open_files (urlpath[, mode, …]) |
Given a path or paths, return a list of OpenFile objects. |
fsspec.open (urlpath[, mode, compression, …]) |
Given a path or paths, return one OpenFile object. |
fsspec.filesystem (protocol, **storage_options) |
Instantiate filesystems for given protocol and arguments |
fsspec.get_filesystem_class (protocol) |
Fetch named protocol implementation from the registry |
fsspec.get_mapper (url[, check, create]) |
Create key-value interface for given URL and options |
fsspec.fuse.run |
-
fsspec.
open_files
(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, name_function=None, num=1, protocol=None, newline=None, **kwargs)[source]¶ Given a path or paths, return a list of
OpenFile
objects.For writing, a str path must contain the “*” character, which will be filled in by increasing numbers, e.g., “part*” -> “part1”, “part2” if num=2.
For either reading or writing, can instead provide explicit list of paths.
Parameters: urlpath: string or list
Absolute or relative filepath(s). Prefix with a protocol like
s3://
to read from alternative filesystems. To read from multiple files you can pass a globstring or a list of paths, with the caveat that they must all have the same protocol.mode: ‘rb’, ‘wt’, etc.
compression: string
Compression to use. See
dask.bytes.compression.files
for options.encoding: str
For text mode only
errors: None or str
Passed to TextIOWrapper in text mode
name_function: function or None
if opening a set of files for writing, those files do not yet exist, so we need to generate their names by formatting the urlpath for each sequence number
num: int [1]
if writing mode, number of files we expect to create (passed to name+function)
protocol: str or None
If given, overrides the protocol found in the URL.
newline: bytes or None
Used for line terminator in text mode. If None, uses system default; if blank, uses no translation.
**kwargs: dict
Extra options that make sense to a particular storage connection, e.g. host, port, username, password, etc.
Returns: List of
OpenFile
objects.Examples
>>> files = open_files('2015-*-*.csv') # doctest: +SKIP >>> files = open_files( ... 's3://bucket/2015-*-*.csv.gz', compression='gzip' ... ) # doctest: +SKIP
-
fsspec.
open
(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, protocol=None, newline=None, **kwargs)[source]¶ Given a path or paths, return one
OpenFile
object.Parameters: urlpath: string or list
Absolute or relative filepath. Prefix with a protocol like
s3://
to read from alternative filesystems. Should not include glob character(s).mode: ‘rb’, ‘wt’, etc.
compression: string
Compression to use. See
dask.bytes.compression.files
for options.encoding: str
For text mode only
errors: None or str
Passed to TextIOWrapper in text mode
protocol: str or None
If given, overrides the protocol found in the URL.
newline: bytes or None
Used for line terminator in text mode. If None, uses system default; if blank, uses no translation.
**kwargs: dict
Extra options that make sense to a particular storage connection, e.g. host, port, username, password, etc.
Returns: OpenFile
object.Examples
>>> openfile = open('2015-01-01.csv') # doctest: +SKIP >>> openfile = open( ... 's3://bucket/2015-01-01.csv.gz', ... compression='gzip' ... ) # doctest: +SKIP >>> with openfile as f: ... df = pd.read_csv(f) # doctest: +SKIP
-
fsspec.
filesystem
(protocol, **storage_options)[source]¶ Instantiate filesystems for given protocol and arguments
storage_options
are specific to the protocol being chosen, and are passed directly to the class.
-
fsspec.
get_filesystem_class
(protocol)[source]¶ Fetch named protocol implementation from the registry
The dict
known_implementations
maps protocol names to the locations of classes implementing the corresponding file-system. When used for the first time, appropriate imports will happen and the class will be placed in the registry. All subsequent calls will fetch directly from the registry.Some protocol implementations require additional dependencies, and so the import may fail. In this case, the string in the “err” field of the
known_implementations
will be given as the error message.
-
fsspec.
get_mapper
(url, check=False, create=False, **kwargs)[source]¶ Create key-value interface for given URL and options
The URL will be of the form “protocol://location” and point to the root of the mapper required. All keys will be file-names below this location, and their values the contents of each key.
Parameters: url: str
Root URL of mapping
check: bool
Whether to attempt to read from the location before instantiation, to check that the mapping does exist
create: bool
Whether to make the directory corresponding to the root before instantiating
Returns: FSMap
instance, the dict-like key-value store.
Base Classes¶
fsspec.spec.AbstractFileSystem (*args, …) |
An abstract super-class for pythonic file-systems |
fsspec.spec.Transaction (fs) |
Filesystem transaction write context |
fsspec.spec.AbstractBufferedFile (fs, path[, …]) |
Convenient class to derive from to provide buffering |
fsspec.FSMap (root, fs[, check, create]) |
Wrap a FileSystem instance as a mutable wrapping. |
fsspec.core.OpenFile (fs, path[, mode, …]) |
File-like object to be used in a context |
fsspec.core.BaseCache (blocksize, fetcher, size) |
Pass-though cache: doesn’t keep anything, calls every time |
-
class
fsspec.spec.
AbstractFileSystem
(*args, **storage_options)[source]¶ An abstract super-class for pythonic file-systems
Implementations are expected to be compatible with or, better, subclass from here.
Attributes
transaction
A context within which files are committed together upon exit Methods
cat
(path)Get the content of a file checksum
(path)Unique value for current version of file clear_instance_cache
()Clear the cache of filesystem instances. copy
(path1, path2, **kwargs)Copy within two locations in the filesystem cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy. current
()Return the most recently created FileSystem delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm. disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du. download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get. du
(path[, total, maxdepth])Space used by files within a path end_transaction
()Finish write transaction, non-context version exists
(path)Is there a file at the given path find
(path[, maxdepth, withdirs])List all files below path. get
(rpath, lpath[, recursive])Copy file to local. get_mapper
(root[, check, create])Create key/value store based on this file-system glob
(path, **kwargs)Find files by glob-matching. head
(path[, size])Get the first size
bytes from fileinfo
(path, **kwargs)Give details of entry at path invalidate_cache
([path])Discard any cached directory information isdir
(path)Is this entry directory-like? isfile
(path)Is this entry file-like? listdir
(path[, detail])Alias of FilesystemSpec.ls. ls
(path[, detail])List objects at path. makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir. makedirs
(path[, exist_ok])Recursively make directories mkdir
(path[, create_parents])Create directory entry at path mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs. move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. mv
(path1, path2, **kwargs)Move file from one location to another open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem put
(lpath, rpath[, recursive])Upload file from local read_block
(fn, offset, length[, delimiter])Read a block of bytes from rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. rm
(path[, recursive, maxdepth])Delete files. rmdir
(path)Remove a directory, if empty size
(path)Size in bytes of file start_transaction
()Begin write transaction for deferring files, non-context version stat
(path, **kwargs)Alias of FilesystemSpec.info. tail
(path[, size])Get the last size
bytes from filetouch
(path[, truncate])Create empty file, or update timestamp ukey
(path)Hash of file properties, to tell if it has changed upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put. walk
(path[, maxdepth])Return all files belows path
-
class
fsspec.spec.
Transaction
(fs)[source]¶ Filesystem transaction write context
Gathers files for deferred commit or discard, so that several write operations can be finalized semi-atomically. This works by having this instance as the
.transaction
attribute of the given filesystemMethods
complete
([commit])Finish transaction: commit or discard all deferred files start
()Start a transaction on this FileSystem
-
class
fsspec.spec.
AbstractBufferedFile
(fs, path, mode='rb', block_size='default', autocommit=True, cache_type='readahead', cache_options=None, **kwargs)[source]¶ Convenient class to derive from to provide buffering
In the case that the backend does not provide a pythonic file-like object already, this class contains much of the logic to build one. The only methods that need to be overridden are
_upload_chunk
,_initate_upload
and_fetch_range
.Attributes
closed
Methods
close
()Close file commit
()Move from temp to final destination discard
()Throw away temporary file fileno
($self, /)Returns underlying file descriptor if one exists. flush
([force])Write buffered data to backend store. info
()File information about this path isatty
($self, /)Return whether this is an ‘interactive’ stream. read
([length])Return data from cache, or fetch pieces as necessary readable
()Whether opened for reading readinto
(b)mirrors builtin file’s readinto method readinto1
(b)readline
()Read until first occurrence of newline character readlines
()Return all data, split by the newline character readuntil
([char, blocks])Return data between current position and first occurrence of char seek
(loc[, whence])Set current file location seekable
()Whether is seekable (only in read mode) tell
()Current file location truncate
Truncate file to size bytes. writable
()Whether opened for writing write
(data)Write data to buffer. writelines
($self, lines, /)Write a list of lines to stream. -
flush
(force=False)[source]¶ Write buffered data to backend store.
Writes the current buffer, if it is larger than the block-size, or if the file is being closed.
Parameters: force: bool
When closing, write the last block even if it is smaller than blocks are allowed to be. Disallows further writing to this file.
-
read
(length=-1)[source]¶ Return data from cache, or fetch pieces as necessary
Parameters: length: int (-1)
Number of bytes to read; if <0, all remaining bytes.
-
readinto
(b)[source]¶ mirrors builtin file’s readinto method
https://docs.python.org/3/library/io.html#io.RawIOBase.readinto
-
readline
()[source]¶ Read until first occurrence of newline character
Note that, because of character encoding, this is not necessarily a true line ending.
-
readuntil
(char=b'\n', blocks=None)[source]¶ Return data between current position and first occurrence of char
char is included in the output, except if the end of the tile is encountered first.
Parameters: char: bytes
Thing to find
blocks: None or int
How much to read in each go. Defaults to file blocksize - which may mean a new read on every call.
-
-
class
fsspec.
FSMap
(root, fs, check=False, create=False)[source]¶ Wrap a FileSystem instance as a mutable wrapping.
The keys of the mapping become files under the given root, and the values (which must be bytes) the contents of those files.
Parameters: root: string
prefix for all the files
fs: FileSystem instance
check: bool (=True)
performs a touch at the location, to check for write access.
Examples
>>> fs = FileSystem(**parameters) # doctest: +SKIP >>> d = FSMap('my-data/path/', fs) # doctest: +SKIP or, more likely >>> d = fs.get_mapper('my-data/path/')
>>> d['loc1'] = b'Hello World' # doctest: +SKIP >>> list(d.keys()) # doctest: +SKIP ['loc1'] >>> d['loc1'] # doctest: +SKIP b'Hello World'
Methods
clear
()Remove all keys below root - empties out mapping get
(k[,d])items
()keys
()pop
(k[,d])If key is not found, d is returned if given, otherwise KeyError is raised. popitem
()as a 2-tuple; but raise KeyError if D is empty. setdefault
(k[,d])update
([E, ]**F)If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v values
()
-
class
fsspec.core.
OpenFile
(fs, path, mode='rb', compression=None, encoding=None, errors=None, newline=None)[source]¶ File-like object to be used in a context
Can layer (buffered) text-mode and compression over any file-system, which are typically binary-only.
These instances are safe to serialize, as the low-level file object is not created until invoked using with.
Parameters: fs: FileSystem
The file system to use for opening the file. Should match the interface of
dask.bytes.local.LocalFileSystem
.path: str
Location to open
mode: str like ‘rb’, optional
Mode of the opened file
compression: str or None, optional
Compression to apply
encoding: str or None, optional
The encoding to use if opened in text mode.
errors: str or None, optional
How to handle encoding errors if opened in text mode.
newline: None or str
Passed to TextIOWrapper in text mode, how to handle line endings.
Methods
close
()Close all encapsulated file objects open
()Materialise this as a real open file without context
-
class
fsspec.core.
BaseCache
(blocksize, fetcher, size)[source]¶ Pass-though cache: doesn’t keep anything, calls every time
Acts as base class for other cachers
Parameters: blocksize: int
How far to read ahead in numbers of bytes
fetcher: func
Function of the form f(start, end) which gets bytes from remote as specified
size: int
How big this file is
Built-in Implementations¶
fsspec.implementations.ftp.FTPFileSystem (host) |
A filesystem over classic |
fsspec.implementations.hdfs.PyArrowHDFS |
|
fsspec.implementations.http.HTTPFileSystem ([…]) |
Simple File-System for fetching data via HTTP(S) |
fsspec.implementations.local.LocalFileSystem ([…]) |
Interface to files on local storage |
fsspec.implementations.memory.MemoryFileSystem (…) |
A filesystem based on a dict of BytesIO objects |
fsspec.implementations.sftp.SFTPFileSystem |
|
fsspec.implementations.webhdfs.WebHDFS (host) |
Interface to HDFS over HTTP |
fsspec.implementations.zip.ZipFileSystem ([…]) |
Read contents of ZIP archive as a file-system |
fsspec.implementations.cached.CachingFileSystem ([…]) |
Locally caching filesystem, layer over any other FS |
fsspec.implementations.cached.WholeFileCacheFileSystem ([…]) |
Caches whole remote files on first access |
-
class
fsspec.implementations.ftp.
FTPFileSystem
(host, port=21, username=None, password=None, acct=None, block_size=None, tempdir='/tmp', timeout=30, **kwargs)[source]¶ A filesystem over classic
Attributes
transaction
A context within which files are committed together upon exit Methods
cat
(path)Get the content of a file checksum
(path)Unique value for current version of file clear_instance_cache
()Clear the cache of filesystem instances. copy
(path1, path2, **kwargs)Copy within two locations in the filesystem cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy. current
()Return the most recently created FileSystem delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm. disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du. download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get. du
(path[, total, maxdepth])Space used by files within a path end_transaction
()Finish write transaction, non-context version exists
(path)Is there a file at the given path find
(path[, maxdepth, withdirs])List all files below path. get
(rpath, lpath[, recursive])Copy file to local. get_mapper
(root[, check, create])Create key/value store based on this file-system glob
(path, **kwargs)Find files by glob-matching. head
(path[, size])Get the first size
bytes from fileinfo
(path, **kwargs)Give details of entry at path invalidate_cache
([path])Discard any cached directory information isdir
(path)Is this entry directory-like? isfile
(path)Is this entry file-like? listdir
(path[, detail])Alias of FilesystemSpec.ls. ls
(path[, detail])List objects at path. makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir. makedirs
(path[, exist_ok])Recursively make directories mkdir
(path, **kwargs)Create directory entry at path mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs. move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. mv
(path1, path2, **kwargs)Move file from one location to another open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem put
(lpath, rpath[, recursive])Upload file from local read_block
(fn, offset, length[, delimiter])Read a block of bytes from rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. rm
(path[, recursive, maxdepth])Delete files. rmdir
(path)Remove a directory, if empty size
(path)Size in bytes of file start_transaction
()Begin write transaction for deferring files, non-context version stat
(path, **kwargs)Alias of FilesystemSpec.info. tail
(path[, size])Get the last size
bytes from filetouch
(path[, truncate])Create empty file, or update timestamp ukey
(path)Hash of file properties, to tell if it has changed upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put. walk
(path[, maxdepth])Return all files belows path -
__init__
(host, port=21, username=None, password=None, acct=None, block_size=None, tempdir='/tmp', timeout=30, **kwargs)[source]¶ You can use _get_kwargs_from_urls to get some kwargs from a reasonable FTP url.
Authentication will be anonymous if username/password are not given.
Parameters: host: str
The remote server name/ip to connect to
port: int
Port to connect with
username: str or None
If authenticating, the user’s identifier
password: str of None
User’s password on the server, if using
acct: str or None
Some servers also need an “account” string for auth
block_size: int or None
If given, the read-ahead or write buffer size.
tempdir: str
Directory on remote to put temporary files when in a transaction
-
-
class
fsspec.implementations.http.
HTTPFileSystem
(simple_links=True, block_size=None, same_scheme=True, size_policy=None, **storage_options)[source]¶ Simple File-System for fetching data via HTTP(S)
ls()
is implemented by loading the parent page and doing a regex match on the result. If simple_link=True, anything of the form “http(s)://server.com/stuff?thing=other”; otherwise only links within HTML href tags will be used.Attributes
transaction
A context within which files are committed together upon exit Methods
cat
(url)Get the content of a file checksum
(path)Unique value for current version of file clear_instance_cache
()Clear the cache of filesystem instances. copy
(path1, path2, **kwargs)Copy within two locations in the filesystem cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy. current
()Return the most recently created FileSystem delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm. disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du. download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get. du
(path[, total, maxdepth])Space used by files within a path end_transaction
()Finish write transaction, non-context version exists
(path)Is there a file at the given path find
(path[, maxdepth, withdirs])List all files below path. get
(rpath, lpath[, recursive])Copy file to local. get_mapper
(root[, check, create])Create key/value store based on this file-system glob
(path, **kwargs)Find files by glob-matching. head
(path[, size])Get the first size
bytes from fileinfo
(url, **kwargs)Get info of URL invalidate_cache
([path])Discard any cached directory information isdir
(path)Is this entry directory-like? isfile
(path)Is this entry file-like? listdir
(path[, detail])Alias of FilesystemSpec.ls. ls
(url[, detail])List objects at path. makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir. makedirs
(path[, exist_ok])Recursively make directories mkdir
(path[, create_parents])Create directory entry at path mkdirs
(url)Make any intermediate directories to make path writable move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. mv
(path1, path2, **kwargs)Move file from one location to another open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem put
(lpath, rpath[, recursive])Upload file from local read_block
(fn, offset, length[, delimiter])Read a block of bytes from rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. rm
(path[, recursive, maxdepth])Delete files. rmdir
(path)Remove a directory, if empty size
(path)Size in bytes of file start_transaction
()Begin write transaction for deferring files, non-context version stat
(path, **kwargs)Alias of FilesystemSpec.info. tail
(path[, size])Get the last size
bytes from filetouch
(path[, truncate])Create empty file, or update timestamp ukey
(url)Unique identifier; assume HTTP files are static, unchanging upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put. walk
(path[, maxdepth])Return all files belows path -
__init__
(simple_links=True, block_size=None, same_scheme=True, size_policy=None, **storage_options)[source]¶ Parameters: block_size: int
Blocks to read bytes; if 0, will default to raw requests file-like objects instead of HTTPFile instances
simple_links: bool
If True, will consider both HTML <a> tags and anything that looks like a URL; if False, will consider only the former.
same_scheme: True
When doing ls/glob, if this is True, only consider paths that have http/https matching the input URLs.
size_policy: this argument is deprecated
storage_options: key-value
May be credentials, e.g., {‘auth’: (‘username’, ‘pword’)} or any other parameters passed on to requests
-
-
class
fsspec.implementations.local.
LocalFileSystem
(auto_mkdir=True, **kwargs)[source]¶ Interface to files on local storage
Parameters: auto_mkdirs: bool
Whether, when opening a file, the directory containing it should be created (if it doesn’t already exist). This is assumed by pyarrow code.
Attributes
transaction
A context within which files are committed together upon exit Methods
cat
(path)Get the content of a file checksum
(path)Unique value for current version of file clear_instance_cache
()Clear the cache of filesystem instances. copy
(path1, path2, **kwargs)Copy within two locations in the filesystem cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy. current
()Return the most recently created FileSystem delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm. disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du. download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get. du
(path[, total, maxdepth])Space used by files within a path end_transaction
()Finish write transaction, non-context version exists
(path)Is there a file at the given path find
(path[, maxdepth, withdirs])List all files below path. get
(path1, path2, **kwargs)Copy file to local. get_mapper
(root[, check, create])Create key/value store based on this file-system glob
(path, **kargs)Find files by glob-matching. head
(path[, size])Get the first size
bytes from fileinfo
(path, **kwargs)Give details of entry at path invalidate_cache
([path])Discard any cached directory information isdir
(path)Is this entry directory-like? isfile
(path)Is this entry file-like? listdir
(path[, detail])Alias of FilesystemSpec.ls. ls
(path[, detail])List objects at path. makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir. makedirs
(path[, exist_ok])Recursively make directories mkdir
(path[, create_parents])Create directory entry at path mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs. move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. mv
(path1, path2, **kwargs)Move file from one location to another open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem put
(path1, path2, **kwargs)Upload file from local read_block
(fn, offset, length[, delimiter])Read a block of bytes from rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. rm
(path[, recursive, maxdepth])Delete files. rmdir
(path)Remove a directory, if empty size
(path)Size in bytes of file start_transaction
()Begin write transaction for deferring files, non-context version stat
(path, **kwargs)Alias of FilesystemSpec.info. tail
(path[, size])Get the last size
bytes from filetouch
(path, **kwargs)Create empty file, or update timestamp ukey
(path)Hash of file properties, to tell if it has changed upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put. walk
(path[, maxdepth])Return all files belows path -
get
(path1, path2, **kwargs)[source]¶ Copy file to local.
Possible extension: maybe should be able to copy to any file-system (streaming through local).
-
glob
(path, **kargs)[source]¶ Find files by glob-matching.
If the path ends with ‘/’ and does not contain “*”, it is essentially the same as
ls(path)
, returning only files.We support
"**"
,"?"
and"[..]"
.kwargs are passed to
ls
.
-
info
(path, **kwargs)[source]¶ Give details of entry at path
Returns a single dictionary, with exactly the same information as
ls
would withdetail=True
.The default implementation should calls ls and could be overridden by a shortcut. kwargs are passed on to
`ls()
.Some file systems might not be able to measure the file’s size, in which case, the returned dict will include
'size': None
.Returns: dict with keys: name (full path in the FS), size (in bytes), type (file,
directory, or something else) and other FS-specific keys.
-
ls
(path, detail=False)[source]¶ List objects at path.
This should include subdirectories and files at that location. The difference between a file and a directory must be clear when details are requested.
The specific keys, or perhaps a FileInfo class, or similar, is TBD, but must be consistent across implementations. Must include: - full path to the entry (without protocol) - size of the entry, in bytes. If the value cannot be determined, will
beNone
.- type of entry, “file”, “directory” or other
Additional information may be present, aproriate to the file-system, e.g., generation, checksum, etc.
May use refresh=True|False to allow use of self._ls_from_cache to check for a saved listing and avoid calling the backend. This would be common where listing may be expensive.
Parameters: path: str
detail: bool
if True, gives a list of dictionaries, where each is the same as the result of
info(path)
. If False, gives a list of paths (str).kwargs: may have additional backend-specific options, such as version
information
Returns: List of strings if detail is False, or list of directory information
dicts if detail is True.
-
makedirs
(path, exist_ok=False)[source]¶ Recursively make directories
Creates directory at path and any intervening required directories. Raises exception if, for instance, the path already exists but is a file.
Parameters: path: str
leaf directory name
exist_ok: bool (False)
If True, will error if the target already exists
-
mkdir
(path, create_parents=True, **kwargs)[source]¶ Create directory entry at path
For systems that don’t have true directories, may create an for this instance only and not touch the real filesystem
Parameters: path: str
location
create_parents: bool
if True, this is equivalent to
makedirs
kwargs:
may be permissions, etc.
-
rm
(path, recursive=False, maxdepth=None)[source]¶ Delete files.
Parameters: path: str or list of str
File(s) to delete.
recursive: bool
If file(s) are directories, recursively delete contents and then also remove the directory
maxdepth: int or None
Depth to pass to walk for finding files to delete, if recursive. If None, there will be no limit and infinite recursion may be possible.
-
-
class
fsspec.implementations.memory.
MemoryFileSystem
(*args, **storage_options)[source]¶ A filesystem based on a dict of BytesIO objects
Attributes
transaction
A context within which files are committed together upon exit Methods
cat
(path)Get the content of a file checksum
(path)Unique value for current version of file clear_instance_cache
()Clear the cache of filesystem instances. copy
(path1, path2, **kwargs)Copy within two locations in the filesystem cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy. current
()Return the most recently created FileSystem delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm. disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du. download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get. du
(path[, total, maxdepth])Space used by files within a path end_transaction
()Finish write transaction, non-context version exists
(path)Is there a file at the given path find
(path[, maxdepth, withdirs])List all files below path. get
(rpath, lpath[, recursive])Copy file to local. get_mapper
(root[, check, create])Create key/value store based on this file-system glob
(path, **kwargs)Find files by glob-matching. head
(path[, size])Get the first size
bytes from fileinfo
(path, **kwargs)Give details of entry at path invalidate_cache
([path])Discard any cached directory information isdir
(path)Is this entry directory-like? isfile
(path)Is this entry file-like? listdir
(path[, detail])Alias of FilesystemSpec.ls. ls
(path[, detail])List objects at path. makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir. makedirs
(path[, exist_ok])Recursively make directories mkdir
(path)Create directory entry at path mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs. move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. mv
(path1, path2, **kwargs)Move file from one location to another open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem put
(lpath, rpath[, recursive])Upload file from local read_block
(fn, offset, length[, delimiter])Read a block of bytes from rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. rm
(path[, recursive, maxdepth])Delete files. rmdir
(path)Remove a directory, if empty size
(path)Size in bytes of the file at path start_transaction
()Begin write transaction for deferring files, non-context version stat
(path, **kwargs)Alias of FilesystemSpec.info. tail
(path[, size])Get the last size
bytes from filetouch
(path[, truncate])Create empty file, or update timestamp ukey
(path)Hash of file properties, to tell if it has changed upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put. walk
(path[, maxdepth])Return all files belows path -
__init__
(*args, **storage_options)¶ Create and configure file-system instance
Instances may be cachable, so if similar enough arguments are seen a new instance is not required. The token attribute exists to allow implementations to cache instances if they wish.
A reasonable default should be provided if there are no arguments.
Subclasses should call this method.
Magic kwargs that affect functionality here: add_docs: if True, will append docstrings from this spec to the
specific implementation
-
-
class
fsspec.implementations.webhdfs.
WebHDFS
(host, port=50070, kerberos=False, token=None, user=None, proxy_to=None, kerb_kwargs=None, data_proxy=None, **kwargs)[source]¶ Interface to HDFS over HTTP
Three auth mechanisms are supported:
- insecure: no auth is done, and the user is assumed to be whoever they
- say they are (parameter user), or a predefined value such as “dr.who” if not given
- spnego: when kerberos authentication is enabled, auth is negotiated by
- requests_kerberos https://github.com/requests/requests-kerberos .
This establishes a session based on existing kinit login and/or
specified principal/password; paraneters are passed with
kerb_kwargs
- token: uses an existing Hadoop delegation token from another secured
- service. Indeed, this client can also generate such tokens when not insecure. Note that tokens expire, but can be renewed (by a previously specified user) and may allow for proxying.
Attributes
transaction
A context within which files are committed together upon exit Methods
cancel_delegation_token
(token)Stop the token from being useful cat
(path)Get the content of a file checksum
(path)Unique value for current version of file chmod
(path, mod)Set the permission at path chown
(path[, owner, group])Change owning user and/or group clear_instance_cache
()Clear the cache of filesystem instances. content_summary
(path)Total numbers of files, directories and bytes under path copy
(path1, path2, **kwargs)Copy within two locations in the filesystem cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy. current
()Return the most recently created FileSystem delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm. disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du. download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get. du
(path[, total, maxdepth])Space used by files within a path end_transaction
()Finish write transaction, non-context version exists
(path)Is there a file at the given path find
(path[, maxdepth, withdirs])List all files below path. get
(rpath, lpath[, recursive])Copy file to local. get_delegation_token
([renewer])Retrieve token which can give the same authority to other uses get_mapper
(root[, check, create])Create key/value store based on this file-system glob
(path, **kwargs)Find files by glob-matching. head
(path[, size])Get the first size
bytes from filehome_directory
()Get user’s home directory info
(path)Give details of entry at path invalidate_cache
([path])Discard any cached directory information isdir
(path)Is this entry directory-like? isfile
(path)Is this entry file-like? listdir
(path[, detail])Alias of FilesystemSpec.ls. ls
(path[, detail])List objects at path. makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir. makedirs
(path[, exist_ok])Recursively make directories mkdir
(path, **kwargs)Create directory entry at path mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs. move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. mv
(path1, path2, **kwargs)Move file from one location to another open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem put
(lpath, rpath[, recursive])Upload file from local read_block
(fn, offset, length[, delimiter])Read a block of bytes from rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. renew_delegation_token
(token)Make token live longer. rm
(path[, recursive])Delete files. rmdir
(path)Remove a directory, if empty set_replication
(path, replication)Set file replication factor size
(path)Size in bytes of file start_transaction
()Begin write transaction for deferring files, non-context version stat
(path, **kwargs)Alias of FilesystemSpec.info. tail
(path[, size])Get the last size
bytes from filetouch
(path[, truncate])Create empty file, or update timestamp ukey
(path)Checksum info of file, giving method and result upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put. walk
(path[, maxdepth])Return all files belows path -
__init__
(host, port=50070, kerberos=False, token=None, user=None, proxy_to=None, kerb_kwargs=None, data_proxy=None, **kwargs)[source]¶ Parameters: host: str
Name-node address
port: int
Port for webHDFS
kerberos: bool
Whether to authenticate with kerberos for this connection
token: str or None
If given, use this token on every call to authenticate. A user and user-proxy may be encoded in the token and should not be also given
user: str or None
If given, assert the user name to connect with
proxy_to: str or None
If given, the user has the authority to proxy, and this value is the user in who’s name actions are taken
kerb_kwargs: dict
Any extra arguments for HTTPKerberosAuth, see https://github.com/requests/requests-kerberos/blob/master/requests_kerberos/kerberos_.py
data_proxy: dict, callable or None
If given, map data-node addresses. This can be necessary if the HDFS cluster is behind a proxy, running on Docker or otherwise has a mismatch between the host-names given by the name-node and the address by which to refer to them from the client. If a dict, maps host names host->data_proxy[host]; if a callable, full URLs are passed, and function must conform to url->data_proxy(url).
kwargs
-
class
fsspec.implementations.zip.
ZipFileSystem
(fo='', mode='r', **storage_options)[source]¶ Read contents of ZIP archive as a file-system
Keeps file object open while instance lives.
This class is pickleable, but not necessarily thread-safe
Attributes
transaction
A context within which files are committed together upon exit Methods
cat
(path)Get the content of a file checksum
(path)Unique value for current version of file clear_instance_cache
()Clear the cache of filesystem instances. copy
(path1, path2, **kwargs)Copy within two locations in the filesystem cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy. current
()Return the most recently created FileSystem delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm. disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du. download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get. du
(path[, total, maxdepth])Space used by files within a path end_transaction
()Finish write transaction, non-context version exists
(path)Is there a file at the given path find
(path[, maxdepth, withdirs])List all files below path. get
(rpath, lpath[, recursive])Copy file to local. get_mapper
(root[, check, create])Create key/value store based on this file-system glob
(path, **kwargs)Find files by glob-matching. head
(path[, size])Get the first size
bytes from fileinfo
(path, **kwargs)Give details of entry at path invalidate_cache
([path])Discard any cached directory information isdir
(path)Is this entry directory-like? isfile
(path)Is this entry file-like? listdir
(path[, detail])Alias of FilesystemSpec.ls. ls
(path[, detail])List objects at path. makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir. makedirs
(path[, exist_ok])Recursively make directories mkdir
(path[, create_parents])Create directory entry at path mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs. move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. mv
(path1, path2, **kwargs)Move file from one location to another open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem put
(lpath, rpath[, recursive])Upload file from local read_block
(fn, offset, length[, delimiter])Read a block of bytes from rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. rm
(path[, recursive, maxdepth])Delete files. rmdir
(path)Remove a directory, if empty size
(path)Size in bytes of file start_transaction
()Begin write transaction for deferring files, non-context version stat
(path, **kwargs)Alias of FilesystemSpec.info. tail
(path[, size])Get the last size
bytes from filetouch
(path[, truncate])Create empty file, or update timestamp ukey
(path)Hash of file properties, to tell if it has changed upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put. walk
(path[, maxdepth])Return all files belows path -
__init__
(fo='', mode='r', **storage_options)[source]¶ Parameters: fo: str or file-like
Contains ZIP, and must exist. If a str, will fetch file using open_files(), which must return one file exactly.
mode: str
Currently, only ‘r’ accepted
storage_options: key-value
May be credentials, e.g., {‘auth’: (‘username’, ‘pword’)} or any other parameters for requests
-
-
class
fsspec.implementations.cached.
CachingFileSystem
(target_protocol=None, cache_storage='TMP', cache_check=10, check_files=False, expiry_time=604800, target_options=None, **kwargs)[source]¶ Locally caching filesystem, layer over any other FS
This class implements chunk-wise local storage of remote files, for quick access after the initial download. The files are stored in a given directory with random hashes for the filenames. If no directory is given, a temporary one is used, which should be cleaned up by the OS after the process ends. The files themselves as sparse (as implemented in MMapCache), so only the data which is accessed takes up space.
Restrictions:
- the block-size must be the same for each access of a given file, unless all blocks of the file have already been read
- caching can only be applied to file-systems which produce files derived from fsspec.spec.AbstractBufferedFile ; LocalFileSystem is also allowed, for testing
Attributes
transaction
A context within which files are committed together upon exit Methods
cat
(path)Get the content of a file checksum
(path)Unique value for current version of file clear_instance_cache
()Clear the cache of filesystem instances. close_and_update
(f, close)Called when a file is closing, so store the set of blocks copy
(path1, path2, **kwargs)Copy within two locations in the filesystem cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy. current
()Return the most recently created FileSystem delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm. disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du. download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get. du
(path[, total, maxdepth])Space used by files within a path end_transaction
()Finish write transaction, non-context version exists
(path)Is there a file at the given path find
(path[, maxdepth, withdirs])List all files below path. get
(rpath, lpath[, recursive])Copy file to local. get_mapper
(root[, check, create])Create key/value store based on this file-system glob
(path, **kwargs)Find files by glob-matching. head
(path[, size])Get the first size
bytes from fileinfo
(path, **kwargs)Give details of entry at path invalidate_cache
([path])Discard any cached directory information isdir
(path)Is this entry directory-like? isfile
(path)Is this entry file-like? listdir
(path[, detail])Alias of FilesystemSpec.ls. load_cache
()Read set of stored blocks from file ls
(path[, detail])List objects at path. makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir. makedirs
(path[, exist_ok])Recursively make directories mkdir
(path[, create_parents])Create directory entry at path mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs. move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. mv
(path1, path2, **kwargs)Move file from one location to another open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem put
(lpath, rpath[, recursive])Upload file from local read_block
(fn, offset, length[, delimiter])Read a block of bytes from rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. rm
(path[, recursive, maxdepth])Delete files. rmdir
(path)Remove a directory, if empty save_cache
()Save set of stored blocks from file size
(path)Size in bytes of file start_transaction
()Begin write transaction for deferring files, non-context version stat
(path, **kwargs)Alias of FilesystemSpec.info. tail
(path[, size])Get the last size
bytes from filetouch
(path[, truncate])Create empty file, or update timestamp ukey
(path)Hash of file properties, to tell if it has changed upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put. walk
(path[, maxdepth])Return all files belows path -
__init__
(target_protocol=None, cache_storage='TMP', cache_check=10, check_files=False, expiry_time=604800, target_options=None, **kwargs)[source]¶ Parameters: target_protocol: str
Target fielsystem protocol
cache_storage: str or list(str)
Location to store files. If “TMP”, this is a temporary directory, and will be cleaned up by the OS when this process ends (or later). If a list, each location will be tried in the order given, but only the last will be considered writable.
cache_check: int
Number of seconds between reload of cache metadata
check_files: bool
Whether to explicitly see if the UID of the remote file matches the stored one before using. Warning: some file systems such as HTTP cannot reliably give a unique hash of the contents of some path, so be sure to set this option to False.
expiry_time: int
The time in seconds after which a local copy is considered useless. Set to falsy to prevent expiry. The default is equivalent to one week.
target_options: dict or None
Passed to the instantiation of the FS, if fs is None.
-
class
fsspec.implementations.cached.
WholeFileCacheFileSystem
(target_protocol=None, cache_storage='TMP', cache_check=10, check_files=False, expiry_time=604800, target_options=None, **kwargs)[source]¶ Caches whole remote files on first access
This class is intended as a layer over any other file system, and will make a local copy of each file accessed, so that all subsequent reads are local. This is similar to
CachingFileSystem
, but without the block-wise functionality and so can work even when sparse files are not allowed. See its docstring for definition of the init arguments.The class still needs access to the remote store for listing files, and may refresh cached files.
Attributes
transaction
A context within which files are committed together upon exit Methods
cat
(path)Get the content of a file checksum
(path)Unique value for current version of file clear_instance_cache
()Clear the cache of filesystem instances. close_and_update
(f, close)Called when a file is closing, so store the set of blocks copy
(path1, path2, **kwargs)Copy within two locations in the filesystem cp
(path1, path2, **kwargs)Alias of FilesystemSpec.copy. current
()Return the most recently created FileSystem delete
(path[, recursive, maxdepth])Alias of FilesystemSpec.rm. disk_usage
(path[, total, maxdepth])Alias of FilesystemSpec.du. download
(rpath, lpath[, recursive])Alias of FilesystemSpec.get. du
(path[, total, maxdepth])Space used by files within a path end_transaction
()Finish write transaction, non-context version exists
(path)Is there a file at the given path find
(path[, maxdepth, withdirs])List all files below path. get
(rpath, lpath[, recursive])Copy file to local. get_mapper
(root[, check, create])Create key/value store based on this file-system glob
(path, **kwargs)Find files by glob-matching. head
(path[, size])Get the first size
bytes from fileinfo
(path, **kwargs)Give details of entry at path invalidate_cache
([path])Discard any cached directory information isdir
(path)Is this entry directory-like? isfile
(path)Is this entry file-like? listdir
(path[, detail])Alias of FilesystemSpec.ls. load_cache
()Read set of stored blocks from file ls
(path[, detail])List objects at path. makedir
(path[, create_parents])Alias of FilesystemSpec.mkdir. makedirs
(path[, exist_ok])Recursively make directories mkdir
(path[, create_parents])Create directory entry at path mkdirs
(path[, exist_ok])Alias of FilesystemSpec.makedirs. move
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. mv
(path1, path2, **kwargs)Move file from one location to another open
(path[, mode, block_size, cache_options])Return a file-like object from the filesystem put
(lpath, rpath[, recursive])Upload file from local read_block
(fn, offset, length[, delimiter])Read a block of bytes from rename
(path1, path2, **kwargs)Alias of FilesystemSpec.mv. rm
(path[, recursive, maxdepth])Delete files. rmdir
(path)Remove a directory, if empty save_cache
()Save set of stored blocks from file size
(path)Size in bytes of file start_transaction
()Begin write transaction for deferring files, non-context version stat
(path, **kwargs)Alias of FilesystemSpec.info. tail
(path[, size])Get the last size
bytes from filetouch
(path[, truncate])Create empty file, or update timestamp ukey
(path)Hash of file properties, to tell if it has changed upload
(lpath, rpath[, recursive])Alias of FilesystemSpec.put. walk
(path[, maxdepth])Return all files belows path
Read Buffering¶
fsspec.caching.ReadAheadCache (blocksize, …) |
Cache which reads only when we get beyond a block of data |
fsspec.caching.BytesCache (blocksize, …[, trim]) |
Cache which holds data in a in-memory bytes object |
fsspec.caching.MMapCache (blocksize, fetcher, …) |
memory-mapped sparse file cache |
fsspec.caching.BlockCache (blocksize, …[, …]) |
Cache holding memory as a set of blocks. |
-
class
fsspec.caching.
ReadAheadCache
(blocksize, fetcher, size)[source]¶ Cache which reads only when we get beyond a block of data
This is a much simpler version of BytesCache, and does not attempt to fill holes in the cache or keep fragments alive. It is best suited to many small reads in a sequential order (e.g., reading lines from a file).
-
class
fsspec.caching.
BytesCache
(blocksize, fetcher, size, trim=True)[source]¶ Cache which holds data in a in-memory bytes object
Implements read-ahead by the block size, for semi-random reads progressing through the file.
Parameters: trim: bool
As we read more data, whether to discard the start of the buffer when we are more than a blocksize ahead of it.
-
class
fsspec.caching.
MMapCache
(blocksize, fetcher, size, location=None, blocks=None)[source]¶ memory-mapped sparse file cache
Opens temporary file, which is filled blocks-wise when data is requested. Ensure there is enough disc space in the temporary location.
This cache method might only work on posix
-
class
fsspec.caching.
BlockCache
(blocksize, fetcher, size, maxblocks=32)[source]¶ Cache holding memory as a set of blocks.
Requests are only ever made blocksize at a time, and are stored in an LRU cache. The least recently accessed block is discarded when more than maxblocks are stored.
Parameters: blocksize : int
The number of bytes to store in each block. Requests are only ever made for blocksize, so this should balance the overhead of making a request against the granularity of the blocks.
fetcher : Callable
size : int
The total size of the file being cached.
maxblocks : int
The maximum number of blocks to cache for. The maximum memory use for this cache is then
blocksize * maxblocks
.Methods
cache_info
()The statistics on the block cache.