API Reference

User Functions

fsspec.open_files(urlpath[, mode, …]) Given a path or paths, return a list of OpenFile objects.
fsspec.open(urlpath[, mode, compression, …]) Given a path or paths, return one OpenFile object.
fsspec.filesystem(protocol, **storage_options) Instantiate filesystems for given protocol and arguments
fsspec.get_filesystem_class(protocol) Fetch named protocol implementation from the registry
fsspec.get_mapper(url[, check, create]) Create key-value interface for given URL and options
fsspec.fuse.run
fsspec.open_files(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, name_function=None, num=1, protocol=None, newline=None, **kwargs)[source]

Given a path or paths, return a list of OpenFile objects.

For writing, a str path must contain the “*” character, which will be filled in by increasing numbers, e.g., “part*” -> “part1”, “part2” if num=2.

For either reading or writing, can instead provide explicit list of paths.

Parameters:

urlpath: string or list

Absolute or relative filepath(s). Prefix with a protocol like s3:// to read from alternative filesystems. To read from multiple files you can pass a globstring or a list of paths, with the caveat that they must all have the same protocol.

mode: ‘rb’, ‘wt’, etc.

compression: string

Compression to use. See dask.bytes.compression.files for options.

encoding: str

For text mode only

errors: None or str

Passed to TextIOWrapper in text mode

name_function: function or None

if opening a set of files for writing, those files do not yet exist, so we need to generate their names by formatting the urlpath for each sequence number

num: int [1]

if writing mode, number of files we expect to create (passed to name+function)

protocol: str or None

If given, overrides the protocol found in the URL.

newline: bytes or None

Used for line terminator in text mode. If None, uses system default; if blank, uses no translation.

**kwargs: dict

Extra options that make sense to a particular storage connection, e.g. host, port, username, password, etc.

Returns:

List of OpenFile objects.

Examples

>>> files = open_files('2015-*-*.csv')  # doctest: +SKIP
>>> files = open_files(
...     's3://bucket/2015-*-*.csv.gz', compression='gzip'
... )  # doctest: +SKIP
fsspec.open(urlpath, mode='rb', compression=None, encoding='utf8', errors=None, protocol=None, newline=None, **kwargs)[source]

Given a path or paths, return one OpenFile object.

Parameters:

urlpath: string or list

Absolute or relative filepath. Prefix with a protocol like s3:// to read from alternative filesystems. Should not include glob character(s).

mode: ‘rb’, ‘wt’, etc.

compression: string

Compression to use. See dask.bytes.compression.files for options.

encoding: str

For text mode only

errors: None or str

Passed to TextIOWrapper in text mode

protocol: str or None

If given, overrides the protocol found in the URL.

newline: bytes or None

Used for line terminator in text mode. If None, uses system default; if blank, uses no translation.

**kwargs: dict

Extra options that make sense to a particular storage connection, e.g. host, port, username, password, etc.

Returns:

OpenFile object.

Examples

>>> openfile = open('2015-01-01.csv')  # doctest: +SKIP
>>> openfile = open(
...     's3://bucket/2015-01-01.csv.gz',
...     compression='gzip'
... )  # doctest: +SKIP
>>> with openfile as f:
...     df = pd.read_csv(f)  # doctest: +SKIP
fsspec.filesystem(protocol, **storage_options)[source]

Instantiate filesystems for given protocol and arguments

storage_options are specific to the protocol being chosen, and are passed directly to the class.

fsspec.get_filesystem_class(protocol)[source]

Fetch named protocol implementation from the registry

The dict known_implementations maps protocol names to the locations of classes implementing the corresponding file-system. When used for the first time, appropriate imports will happen and the class will be placed in the registry. All subsequent calls will fetch directly from the registry.

Some protocol implementations require additional dependencies, and so the import may fail. In this case, the string in the “err” field of the known_implementations will be given as the error message.

fsspec.get_mapper(url, check=False, create=False, **kwargs)[source]

Create key-value interface for given URL and options

The URL will be of the form “protocol://location” and point to the root of the mapper required. All keys will be file-names below this location, and their values the contents of each key.

Parameters:

url: str

Root URL of mapping

check: bool

Whether to attempt to read from the location before instantiation, to check that the mapping does exist

create: bool

Whether to make the directory corresponding to the root before instantiating

Returns:

FSMap instance, the dict-like key-value store.

Base Classes

fsspec.spec.AbstractFileSystem(*args, …) An abstract super-class for pythonic file-systems
fsspec.spec.Transaction(fs) Filesystem transaction write context
fsspec.spec.AbstractBufferedFile(fs, path[, …]) Convenient class to derive from to provide buffering
fsspec.FSMap(root, fs[, check, create]) Wrap a FileSystem instance as a mutable wrapping.
fsspec.core.OpenFile(fs, path[, mode, …]) File-like object to be used in a context
fsspec.core.BaseCache(blocksize, fetcher, size) Pass-though cache: doesn’t keep anything, calls every time
class fsspec.spec.AbstractFileSystem(*args, **storage_options)[source]

An abstract super-class for pythonic file-systems

Implementations are expected to be compatible with or, better, subclass from here.

Attributes

transaction A context within which files are committed together upon exit

Methods

cat(path) Get the content of a file
checksum(path) Unique value for current version of file
clear_instance_cache() Clear the cache of filesystem instances.
copy(path1, path2, **kwargs) Copy within two locations in the filesystem
cp(path1, path2, **kwargs) Alias of FilesystemSpec.copy.
current() Return the most recently created FileSystem
delete(path[, recursive, maxdepth]) Alias of FilesystemSpec.rm.
disk_usage(path[, total, maxdepth]) Alias of FilesystemSpec.du.
download(rpath, lpath[, recursive]) Alias of FilesystemSpec.get.
du(path[, total, maxdepth]) Space used by files within a path
end_transaction() Finish write transaction, non-context version
exists(path) Is there a file at the given path
find(path[, maxdepth, withdirs]) List all files below path.
get(rpath, lpath[, recursive]) Copy file to local.
get_mapper(root[, check, create]) Create key/value store based on this file-system
glob(path, **kwargs) Find files by glob-matching.
head(path[, size]) Get the first size bytes from file
info(path, **kwargs) Give details of entry at path
invalidate_cache([path]) Discard any cached directory information
isdir(path) Is this entry directory-like?
isfile(path) Is this entry file-like?
listdir(path[, detail]) Alias of FilesystemSpec.ls.
ls(path[, detail]) List objects at path.
makedir(path[, create_parents]) Alias of FilesystemSpec.mkdir.
makedirs(path[, exist_ok]) Recursively make directories
mkdir(path[, create_parents]) Create directory entry at path
mkdirs(path[, exist_ok]) Alias of FilesystemSpec.makedirs.
move(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
mv(path1, path2, **kwargs) Move file from one location to another
open(path[, mode, block_size, cache_options]) Return a file-like object from the filesystem
put(lpath, rpath[, recursive]) Upload file from local
read_block(fn, offset, length[, delimiter]) Read a block of bytes from
rename(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
rm(path[, recursive, maxdepth]) Delete files.
rmdir(path) Remove a directory, if empty
size(path) Size in bytes of file
start_transaction() Begin write transaction for deferring files, non-context version
stat(path, **kwargs) Alias of FilesystemSpec.info.
tail(path[, size]) Get the last size bytes from file
touch(path[, truncate]) Create empty file, or update timestamp
ukey(path) Hash of file properties, to tell if it has changed
upload(lpath, rpath[, recursive]) Alias of FilesystemSpec.put.
walk(path[, maxdepth]) Return all files belows path
class fsspec.spec.Transaction(fs)[source]

Filesystem transaction write context

Gathers files for deferred commit or discard, so that several write operations can be finalized semi-atomically. This works by having this instance as the .transaction attribute of the given filesystem

Methods

complete([commit]) Finish transaction: commit or discard all deferred files
start() Start a transaction on this FileSystem
complete(commit=True)[source]

Finish transaction: commit or discard all deferred files

start()[source]

Start a transaction on this FileSystem

class fsspec.spec.AbstractBufferedFile(fs, path, mode='rb', block_size='default', autocommit=True, cache_type='readahead', cache_options=None, **kwargs)[source]

Convenient class to derive from to provide buffering

In the case that the backend does not provide a pythonic file-like object already, this class contains much of the logic to build one. The only methods that need to be overridden are _upload_chunk, _initate_upload and _fetch_range.

Attributes

closed

Methods

close() Close file
commit() Move from temp to final destination
discard() Throw away temporary file
fileno($self, /) Returns underlying file descriptor if one exists.
flush([force]) Write buffered data to backend store.
info() File information about this path
isatty($self, /) Return whether this is an ‘interactive’ stream.
read([length]) Return data from cache, or fetch pieces as necessary
readable() Whether opened for reading
readinto(b) mirrors builtin file’s readinto method
readinto1(b)
readline() Read until first occurrence of newline character
readlines() Return all data, split by the newline character
readuntil([char, blocks]) Return data between current position and first occurrence of char
seek(loc[, whence]) Set current file location
seekable() Whether is seekable (only in read mode)
tell() Current file location
truncate Truncate file to size bytes.
writable() Whether opened for writing
write(data) Write data to buffer.
writelines($self, lines, /) Write a list of lines to stream.
close()[source]

Close file

Finalizes writes, discards cache

commit()[source]

Move from temp to final destination

discard()[source]

Throw away temporary file

flush(force=False)[source]

Write buffered data to backend store.

Writes the current buffer, if it is larger than the block-size, or if the file is being closed.

Parameters:

force: bool

When closing, write the last block even if it is smaller than blocks are allowed to be. Disallows further writing to this file.

info()[source]

File information about this path

read(length=-1)[source]

Return data from cache, or fetch pieces as necessary

Parameters:

length: int (-1)

Number of bytes to read; if <0, all remaining bytes.

readable()[source]

Whether opened for reading

readinto(b)[source]

mirrors builtin file’s readinto method

https://docs.python.org/3/library/io.html#io.RawIOBase.readinto

readline()[source]

Read until first occurrence of newline character

Note that, because of character encoding, this is not necessarily a true line ending.

readlines()[source]

Return all data, split by the newline character

readuntil(char=b'\n', blocks=None)[source]

Return data between current position and first occurrence of char

char is included in the output, except if the end of the tile is encountered first.

Parameters:

char: bytes

Thing to find

blocks: None or int

How much to read in each go. Defaults to file blocksize - which may mean a new read on every call.

seek(loc, whence=0)[source]

Set current file location

Parameters:

loc: int

byte location

whence: {0, 1, 2}

from start of file, current location or end of file, resp.

seekable()[source]

Whether is seekable (only in read mode)

tell()[source]

Current file location

writable()[source]

Whether opened for writing

write(data)[source]

Write data to buffer.

Buffer only sent on flush() or if buffer is greater than or equal to blocksize.

Parameters:

data: bytes

Set of bytes to be written.

class fsspec.FSMap(root, fs, check=False, create=False)[source]

Wrap a FileSystem instance as a mutable wrapping.

The keys of the mapping become files under the given root, and the values (which must be bytes) the contents of those files.

Parameters:

root: string

prefix for all the files

fs: FileSystem instance

check: bool (=True)

performs a touch at the location, to check for write access.

Examples

>>> fs = FileSystem(**parameters) # doctest: +SKIP
>>> d = FSMap('my-data/path/', fs) # doctest: +SKIP
or, more likely
>>> d = fs.get_mapper('my-data/path/')
>>> d['loc1'] = b'Hello World' # doctest: +SKIP
>>> list(d.keys()) # doctest: +SKIP
['loc1']
>>> d['loc1'] # doctest: +SKIP
b'Hello World'

Methods

clear() Remove all keys below root - empties out mapping
get(k[,d])
items()
keys()
pop(k[,d]) If key is not found, d is returned if given, otherwise KeyError is raised.
popitem() as a 2-tuple; but raise KeyError if D is empty.
setdefault(k[,d])
update([E, ]**F) If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
values()
clear()[source]

Remove all keys below root - empties out mapping

pop(k[, d]) → v, remove specified key and return the corresponding value.[source]

If key is not found, d is returned if given, otherwise KeyError is raised.

class fsspec.core.OpenFile(fs, path, mode='rb', compression=None, encoding=None, errors=None, newline=None)[source]

File-like object to be used in a context

Can layer (buffered) text-mode and compression over any file-system, which are typically binary-only.

These instances are safe to serialize, as the low-level file object is not created until invoked using with.

Parameters:

fs: FileSystem

The file system to use for opening the file. Should match the interface of dask.bytes.local.LocalFileSystem.

path: str

Location to open

mode: str like ‘rb’, optional

Mode of the opened file

compression: str or None, optional

Compression to apply

encoding: str or None, optional

The encoding to use if opened in text mode.

errors: str or None, optional

How to handle encoding errors if opened in text mode.

newline: None or str

Passed to TextIOWrapper in text mode, how to handle line endings.

Methods

close() Close all encapsulated file objects
open() Materialise this as a real open file without context
close()[source]

Close all encapsulated file objects

open()[source]

Materialise this as a real open file without context

The file should be explicitly closed to avoid enclosed open file instances persisting

class fsspec.core.BaseCache(blocksize, fetcher, size)[source]

Pass-though cache: doesn’t keep anything, calls every time

Acts as base class for other cachers

Parameters:

blocksize: int

How far to read ahead in numbers of bytes

fetcher: func

Function of the form f(start, end) which gets bytes from remote as specified

size: int

How big this file is

Built-in Implementations

fsspec.implementations.ftp.FTPFileSystem(host) A filesystem over classic
fsspec.implementations.hdfs.PyArrowHDFS
fsspec.implementations.http.HTTPFileSystem([…]) Simple File-System for fetching data via HTTP(S)
fsspec.implementations.local.LocalFileSystem([…]) Interface to files on local storage
fsspec.implementations.memory.MemoryFileSystem(…) A filesystem based on a dict of BytesIO objects
fsspec.implementations.sftp.SFTPFileSystem
fsspec.implementations.webhdfs.WebHDFS(host) Interface to HDFS over HTTP
fsspec.implementations.zip.ZipFileSystem([…]) Read contents of ZIP archive as a file-system
fsspec.implementations.cached.CachingFileSystem([…]) Locally caching filesystem, layer over any other FS
fsspec.implementations.cached.WholeFileCacheFileSystem([…]) Caches whole remote files on first access
class fsspec.implementations.ftp.FTPFileSystem(host, port=21, username=None, password=None, acct=None, block_size=None, tempdir='/tmp', timeout=30, **kwargs)[source]

A filesystem over classic

Attributes

transaction A context within which files are committed together upon exit

Methods

cat(path) Get the content of a file
checksum(path) Unique value for current version of file
clear_instance_cache() Clear the cache of filesystem instances.
copy(path1, path2, **kwargs) Copy within two locations in the filesystem
cp(path1, path2, **kwargs) Alias of FilesystemSpec.copy.
current() Return the most recently created FileSystem
delete(path[, recursive, maxdepth]) Alias of FilesystemSpec.rm.
disk_usage(path[, total, maxdepth]) Alias of FilesystemSpec.du.
download(rpath, lpath[, recursive]) Alias of FilesystemSpec.get.
du(path[, total, maxdepth]) Space used by files within a path
end_transaction() Finish write transaction, non-context version
exists(path) Is there a file at the given path
find(path[, maxdepth, withdirs]) List all files below path.
get(rpath, lpath[, recursive]) Copy file to local.
get_mapper(root[, check, create]) Create key/value store based on this file-system
glob(path, **kwargs) Find files by glob-matching.
head(path[, size]) Get the first size bytes from file
info(path, **kwargs) Give details of entry at path
invalidate_cache([path]) Discard any cached directory information
isdir(path) Is this entry directory-like?
isfile(path) Is this entry file-like?
listdir(path[, detail]) Alias of FilesystemSpec.ls.
ls(path[, detail]) List objects at path.
makedir(path[, create_parents]) Alias of FilesystemSpec.mkdir.
makedirs(path[, exist_ok]) Recursively make directories
mkdir(path, **kwargs) Create directory entry at path
mkdirs(path[, exist_ok]) Alias of FilesystemSpec.makedirs.
move(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
mv(path1, path2, **kwargs) Move file from one location to another
open(path[, mode, block_size, cache_options]) Return a file-like object from the filesystem
put(lpath, rpath[, recursive]) Upload file from local
read_block(fn, offset, length[, delimiter]) Read a block of bytes from
rename(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
rm(path[, recursive, maxdepth]) Delete files.
rmdir(path) Remove a directory, if empty
size(path) Size in bytes of file
start_transaction() Begin write transaction for deferring files, non-context version
stat(path, **kwargs) Alias of FilesystemSpec.info.
tail(path[, size]) Get the last size bytes from file
touch(path[, truncate]) Create empty file, or update timestamp
ukey(path) Hash of file properties, to tell if it has changed
upload(lpath, rpath[, recursive]) Alias of FilesystemSpec.put.
walk(path[, maxdepth]) Return all files belows path
__init__(host, port=21, username=None, password=None, acct=None, block_size=None, tempdir='/tmp', timeout=30, **kwargs)[source]

You can use _get_kwargs_from_urls to get some kwargs from a reasonable FTP url.

Authentication will be anonymous if username/password are not given.

Parameters:

host: str

The remote server name/ip to connect to

port: int

Port to connect with

username: str or None

If authenticating, the user’s identifier

password: str of None

User’s password on the server, if using

acct: str or None

Some servers also need an “account” string for auth

block_size: int or None

If given, the read-ahead or write buffer size.

tempdir: str

Directory on remote to put temporary files when in a transaction

class fsspec.implementations.http.HTTPFileSystem(simple_links=True, block_size=None, same_scheme=True, size_policy=None, **storage_options)[source]

Simple File-System for fetching data via HTTP(S)

ls() is implemented by loading the parent page and doing a regex match on the result. If simple_link=True, anything of the form “http(s)://server.com/stuff?thing=other”; otherwise only links within HTML href tags will be used.

Attributes

transaction A context within which files are committed together upon exit

Methods

cat(url) Get the content of a file
checksum(path) Unique value for current version of file
clear_instance_cache() Clear the cache of filesystem instances.
copy(path1, path2, **kwargs) Copy within two locations in the filesystem
cp(path1, path2, **kwargs) Alias of FilesystemSpec.copy.
current() Return the most recently created FileSystem
delete(path[, recursive, maxdepth]) Alias of FilesystemSpec.rm.
disk_usage(path[, total, maxdepth]) Alias of FilesystemSpec.du.
download(rpath, lpath[, recursive]) Alias of FilesystemSpec.get.
du(path[, total, maxdepth]) Space used by files within a path
end_transaction() Finish write transaction, non-context version
exists(path) Is there a file at the given path
find(path[, maxdepth, withdirs]) List all files below path.
get(rpath, lpath[, recursive]) Copy file to local.
get_mapper(root[, check, create]) Create key/value store based on this file-system
glob(path, **kwargs) Find files by glob-matching.
head(path[, size]) Get the first size bytes from file
info(url, **kwargs) Get info of URL
invalidate_cache([path]) Discard any cached directory information
isdir(path) Is this entry directory-like?
isfile(path) Is this entry file-like?
listdir(path[, detail]) Alias of FilesystemSpec.ls.
ls(url[, detail]) List objects at path.
makedir(path[, create_parents]) Alias of FilesystemSpec.mkdir.
makedirs(path[, exist_ok]) Recursively make directories
mkdir(path[, create_parents]) Create directory entry at path
mkdirs(url) Make any intermediate directories to make path writable
move(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
mv(path1, path2, **kwargs) Move file from one location to another
open(path[, mode, block_size, cache_options]) Return a file-like object from the filesystem
put(lpath, rpath[, recursive]) Upload file from local
read_block(fn, offset, length[, delimiter]) Read a block of bytes from
rename(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
rm(path[, recursive, maxdepth]) Delete files.
rmdir(path) Remove a directory, if empty
size(path) Size in bytes of file
start_transaction() Begin write transaction for deferring files, non-context version
stat(path, **kwargs) Alias of FilesystemSpec.info.
tail(path[, size]) Get the last size bytes from file
touch(path[, truncate]) Create empty file, or update timestamp
ukey(url) Unique identifier; assume HTTP files are static, unchanging
upload(lpath, rpath[, recursive]) Alias of FilesystemSpec.put.
walk(path[, maxdepth]) Return all files belows path
__init__(simple_links=True, block_size=None, same_scheme=True, size_policy=None, **storage_options)[source]
Parameters:

block_size: int

Blocks to read bytes; if 0, will default to raw requests file-like objects instead of HTTPFile instances

simple_links: bool

If True, will consider both HTML <a> tags and anything that looks like a URL; if False, will consider only the former.

same_scheme: True

When doing ls/glob, if this is True, only consider paths that have http/https matching the input URLs.

size_policy: this argument is deprecated

storage_options: key-value

May be credentials, e.g., {‘auth’: (‘username’, ‘pword’)} or any other parameters passed on to requests

class fsspec.implementations.local.LocalFileSystem(auto_mkdir=True, **kwargs)[source]

Interface to files on local storage

Parameters:

auto_mkdirs: bool

Whether, when opening a file, the directory containing it should be created (if it doesn’t already exist). This is assumed by pyarrow code.

Attributes

transaction A context within which files are committed together upon exit

Methods

cat(path) Get the content of a file
checksum(path) Unique value for current version of file
clear_instance_cache() Clear the cache of filesystem instances.
copy(path1, path2, **kwargs) Copy within two locations in the filesystem
cp(path1, path2, **kwargs) Alias of FilesystemSpec.copy.
current() Return the most recently created FileSystem
delete(path[, recursive, maxdepth]) Alias of FilesystemSpec.rm.
disk_usage(path[, total, maxdepth]) Alias of FilesystemSpec.du.
download(rpath, lpath[, recursive]) Alias of FilesystemSpec.get.
du(path[, total, maxdepth]) Space used by files within a path
end_transaction() Finish write transaction, non-context version
exists(path) Is there a file at the given path
find(path[, maxdepth, withdirs]) List all files below path.
get(path1, path2, **kwargs) Copy file to local.
get_mapper(root[, check, create]) Create key/value store based on this file-system
glob(path, **kargs) Find files by glob-matching.
head(path[, size]) Get the first size bytes from file
info(path, **kwargs) Give details of entry at path
invalidate_cache([path]) Discard any cached directory information
isdir(path) Is this entry directory-like?
isfile(path) Is this entry file-like?
listdir(path[, detail]) Alias of FilesystemSpec.ls.
ls(path[, detail]) List objects at path.
makedir(path[, create_parents]) Alias of FilesystemSpec.mkdir.
makedirs(path[, exist_ok]) Recursively make directories
mkdir(path[, create_parents]) Create directory entry at path
mkdirs(path[, exist_ok]) Alias of FilesystemSpec.makedirs.
move(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
mv(path1, path2, **kwargs) Move file from one location to another
open(path[, mode, block_size, cache_options]) Return a file-like object from the filesystem
put(path1, path2, **kwargs) Upload file from local
read_block(fn, offset, length[, delimiter]) Read a block of bytes from
rename(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
rm(path[, recursive, maxdepth]) Delete files.
rmdir(path) Remove a directory, if empty
size(path) Size in bytes of file
start_transaction() Begin write transaction for deferring files, non-context version
stat(path, **kwargs) Alias of FilesystemSpec.info.
tail(path[, size]) Get the last size bytes from file
touch(path, **kwargs) Create empty file, or update timestamp
ukey(path) Hash of file properties, to tell if it has changed
upload(lpath, rpath[, recursive]) Alias of FilesystemSpec.put.
walk(path[, maxdepth]) Return all files belows path
copy(path1, path2, **kwargs)[source]

Copy within two locations in the filesystem

get(path1, path2, **kwargs)[source]

Copy file to local.

Possible extension: maybe should be able to copy to any file-system (streaming through local).

glob(path, **kargs)[source]

Find files by glob-matching.

If the path ends with ‘/’ and does not contain “*”, it is essentially the same as ls(path), returning only files.

We support "**", "?" and "[..]".

kwargs are passed to ls.

info(path, **kwargs)[source]

Give details of entry at path

Returns a single dictionary, with exactly the same information as ls would with detail=True.

The default implementation should calls ls and could be overridden by a shortcut. kwargs are passed on to `ls().

Some file systems might not be able to measure the file’s size, in which case, the returned dict will include 'size': None.

Returns:

dict with keys: name (full path in the FS), size (in bytes), type (file,

directory, or something else) and other FS-specific keys.

ls(path, detail=False)[source]

List objects at path.

This should include subdirectories and files at that location. The difference between a file and a directory must be clear when details are requested.

The specific keys, or perhaps a FileInfo class, or similar, is TBD, but must be consistent across implementations. Must include: - full path to the entry (without protocol) - size of the entry, in bytes. If the value cannot be determined, will

be None.
  • type of entry, “file”, “directory” or other

Additional information may be present, aproriate to the file-system, e.g., generation, checksum, etc.

May use refresh=True|False to allow use of self._ls_from_cache to check for a saved listing and avoid calling the backend. This would be common where listing may be expensive.

Parameters:

path: str

detail: bool

if True, gives a list of dictionaries, where each is the same as the result of info(path). If False, gives a list of paths (str).

kwargs: may have additional backend-specific options, such as version

information

Returns:

List of strings if detail is False, or list of directory information

dicts if detail is True.

makedirs(path, exist_ok=False)[source]

Recursively make directories

Creates directory at path and any intervening required directories. Raises exception if, for instance, the path already exists but is a file.

Parameters:

path: str

leaf directory name

exist_ok: bool (False)

If True, will error if the target already exists

mkdir(path, create_parents=True, **kwargs)[source]

Create directory entry at path

For systems that don’t have true directories, may create an for this instance only and not touch the real filesystem

Parameters:

path: str

location

create_parents: bool

if True, this is equivalent to makedirs

kwargs:

may be permissions, etc.

mv(path1, path2, **kwargs)[source]

Move file from one location to another

put(path1, path2, **kwargs)[source]

Upload file from local

rm(path, recursive=False, maxdepth=None)[source]

Delete files.

Parameters:

path: str or list of str

File(s) to delete.

recursive: bool

If file(s) are directories, recursively delete contents and then also remove the directory

maxdepth: int or None

Depth to pass to walk for finding files to delete, if recursive. If None, there will be no limit and infinite recursion may be possible.

rmdir(path)[source]

Remove a directory, if empty

touch(path, **kwargs)[source]

Create empty file, or update timestamp

Parameters:

path: str

file location

truncate: bool

If True, always set file size to 0; if False, update timestamp and leave file unchanged, if backend allows this

class fsspec.implementations.memory.MemoryFileSystem(*args, **storage_options)[source]

A filesystem based on a dict of BytesIO objects

Attributes

transaction A context within which files are committed together upon exit

Methods

cat(path) Get the content of a file
checksum(path) Unique value for current version of file
clear_instance_cache() Clear the cache of filesystem instances.
copy(path1, path2, **kwargs) Copy within two locations in the filesystem
cp(path1, path2, **kwargs) Alias of FilesystemSpec.copy.
current() Return the most recently created FileSystem
delete(path[, recursive, maxdepth]) Alias of FilesystemSpec.rm.
disk_usage(path[, total, maxdepth]) Alias of FilesystemSpec.du.
download(rpath, lpath[, recursive]) Alias of FilesystemSpec.get.
du(path[, total, maxdepth]) Space used by files within a path
end_transaction() Finish write transaction, non-context version
exists(path) Is there a file at the given path
find(path[, maxdepth, withdirs]) List all files below path.
get(rpath, lpath[, recursive]) Copy file to local.
get_mapper(root[, check, create]) Create key/value store based on this file-system
glob(path, **kwargs) Find files by glob-matching.
head(path[, size]) Get the first size bytes from file
info(path, **kwargs) Give details of entry at path
invalidate_cache([path]) Discard any cached directory information
isdir(path) Is this entry directory-like?
isfile(path) Is this entry file-like?
listdir(path[, detail]) Alias of FilesystemSpec.ls.
ls(path[, detail]) List objects at path.
makedir(path[, create_parents]) Alias of FilesystemSpec.mkdir.
makedirs(path[, exist_ok]) Recursively make directories
mkdir(path) Create directory entry at path
mkdirs(path[, exist_ok]) Alias of FilesystemSpec.makedirs.
move(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
mv(path1, path2, **kwargs) Move file from one location to another
open(path[, mode, block_size, cache_options]) Return a file-like object from the filesystem
put(lpath, rpath[, recursive]) Upload file from local
read_block(fn, offset, length[, delimiter]) Read a block of bytes from
rename(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
rm(path[, recursive, maxdepth]) Delete files.
rmdir(path) Remove a directory, if empty
size(path) Size in bytes of the file at path
start_transaction() Begin write transaction for deferring files, non-context version
stat(path, **kwargs) Alias of FilesystemSpec.info.
tail(path[, size]) Get the last size bytes from file
touch(path[, truncate]) Create empty file, or update timestamp
ukey(path) Hash of file properties, to tell if it has changed
upload(lpath, rpath[, recursive]) Alias of FilesystemSpec.put.
walk(path[, maxdepth]) Return all files belows path
__init__(*args, **storage_options)

Create and configure file-system instance

Instances may be cachable, so if similar enough arguments are seen a new instance is not required. The token attribute exists to allow implementations to cache instances if they wish.

A reasonable default should be provided if there are no arguments.

Subclasses should call this method.

Magic kwargs that affect functionality here: add_docs: if True, will append docstrings from this spec to the

specific implementation
class fsspec.implementations.webhdfs.WebHDFS(host, port=50070, kerberos=False, token=None, user=None, proxy_to=None, kerb_kwargs=None, data_proxy=None, **kwargs)[source]

Interface to HDFS over HTTP

Three auth mechanisms are supported:

insecure: no auth is done, and the user is assumed to be whoever they
say they are (parameter user), or a predefined value such as “dr.who” if not given
spnego: when kerberos authentication is enabled, auth is negotiated by
requests_kerberos https://github.com/requests/requests-kerberos . This establishes a session based on existing kinit login and/or specified principal/password; paraneters are passed with kerb_kwargs
token: uses an existing Hadoop delegation token from another secured
service. Indeed, this client can also generate such tokens when not insecure. Note that tokens expire, but can be renewed (by a previously specified user) and may allow for proxying.

Attributes

transaction A context within which files are committed together upon exit

Methods

cancel_delegation_token(token) Stop the token from being useful
cat(path) Get the content of a file
checksum(path) Unique value for current version of file
chmod(path, mod) Set the permission at path
chown(path[, owner, group]) Change owning user and/or group
clear_instance_cache() Clear the cache of filesystem instances.
content_summary(path) Total numbers of files, directories and bytes under path
copy(path1, path2, **kwargs) Copy within two locations in the filesystem
cp(path1, path2, **kwargs) Alias of FilesystemSpec.copy.
current() Return the most recently created FileSystem
delete(path[, recursive, maxdepth]) Alias of FilesystemSpec.rm.
disk_usage(path[, total, maxdepth]) Alias of FilesystemSpec.du.
download(rpath, lpath[, recursive]) Alias of FilesystemSpec.get.
du(path[, total, maxdepth]) Space used by files within a path
end_transaction() Finish write transaction, non-context version
exists(path) Is there a file at the given path
find(path[, maxdepth, withdirs]) List all files below path.
get(rpath, lpath[, recursive]) Copy file to local.
get_delegation_token([renewer]) Retrieve token which can give the same authority to other uses
get_mapper(root[, check, create]) Create key/value store based on this file-system
glob(path, **kwargs) Find files by glob-matching.
head(path[, size]) Get the first size bytes from file
home_directory() Get user’s home directory
info(path) Give details of entry at path
invalidate_cache([path]) Discard any cached directory information
isdir(path) Is this entry directory-like?
isfile(path) Is this entry file-like?
listdir(path[, detail]) Alias of FilesystemSpec.ls.
ls(path[, detail]) List objects at path.
makedir(path[, create_parents]) Alias of FilesystemSpec.mkdir.
makedirs(path[, exist_ok]) Recursively make directories
mkdir(path, **kwargs) Create directory entry at path
mkdirs(path[, exist_ok]) Alias of FilesystemSpec.makedirs.
move(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
mv(path1, path2, **kwargs) Move file from one location to another
open(path[, mode, block_size, cache_options]) Return a file-like object from the filesystem
put(lpath, rpath[, recursive]) Upload file from local
read_block(fn, offset, length[, delimiter]) Read a block of bytes from
rename(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
renew_delegation_token(token) Make token live longer.
rm(path[, recursive]) Delete files.
rmdir(path) Remove a directory, if empty
set_replication(path, replication) Set file replication factor
size(path) Size in bytes of file
start_transaction() Begin write transaction for deferring files, non-context version
stat(path, **kwargs) Alias of FilesystemSpec.info.
tail(path[, size]) Get the last size bytes from file
touch(path[, truncate]) Create empty file, or update timestamp
ukey(path) Checksum info of file, giving method and result
upload(lpath, rpath[, recursive]) Alias of FilesystemSpec.put.
walk(path[, maxdepth]) Return all files belows path
__init__(host, port=50070, kerberos=False, token=None, user=None, proxy_to=None, kerb_kwargs=None, data_proxy=None, **kwargs)[source]
Parameters:

host: str

Name-node address

port: int

Port for webHDFS

kerberos: bool

Whether to authenticate with kerberos for this connection

token: str or None

If given, use this token on every call to authenticate. A user and user-proxy may be encoded in the token and should not be also given

user: str or None

If given, assert the user name to connect with

proxy_to: str or None

If given, the user has the authority to proxy, and this value is the user in who’s name actions are taken

kerb_kwargs: dict

data_proxy: dict, callable or None

If given, map data-node addresses. This can be necessary if the HDFS cluster is behind a proxy, running on Docker or otherwise has a mismatch between the host-names given by the name-node and the address by which to refer to them from the client. If a dict, maps host names host->data_proxy[host]; if a callable, full URLs are passed, and function must conform to url->data_proxy(url).

kwargs

class fsspec.implementations.zip.ZipFileSystem(fo='', mode='r', **storage_options)[source]

Read contents of ZIP archive as a file-system

Keeps file object open while instance lives.

This class is pickleable, but not necessarily thread-safe

Attributes

transaction A context within which files are committed together upon exit

Methods

cat(path) Get the content of a file
checksum(path) Unique value for current version of file
clear_instance_cache() Clear the cache of filesystem instances.
copy(path1, path2, **kwargs) Copy within two locations in the filesystem
cp(path1, path2, **kwargs) Alias of FilesystemSpec.copy.
current() Return the most recently created FileSystem
delete(path[, recursive, maxdepth]) Alias of FilesystemSpec.rm.
disk_usage(path[, total, maxdepth]) Alias of FilesystemSpec.du.
download(rpath, lpath[, recursive]) Alias of FilesystemSpec.get.
du(path[, total, maxdepth]) Space used by files within a path
end_transaction() Finish write transaction, non-context version
exists(path) Is there a file at the given path
find(path[, maxdepth, withdirs]) List all files below path.
get(rpath, lpath[, recursive]) Copy file to local.
get_mapper(root[, check, create]) Create key/value store based on this file-system
glob(path, **kwargs) Find files by glob-matching.
head(path[, size]) Get the first size bytes from file
info(path, **kwargs) Give details of entry at path
invalidate_cache([path]) Discard any cached directory information
isdir(path) Is this entry directory-like?
isfile(path) Is this entry file-like?
listdir(path[, detail]) Alias of FilesystemSpec.ls.
ls(path[, detail]) List objects at path.
makedir(path[, create_parents]) Alias of FilesystemSpec.mkdir.
makedirs(path[, exist_ok]) Recursively make directories
mkdir(path[, create_parents]) Create directory entry at path
mkdirs(path[, exist_ok]) Alias of FilesystemSpec.makedirs.
move(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
mv(path1, path2, **kwargs) Move file from one location to another
open(path[, mode, block_size, cache_options]) Return a file-like object from the filesystem
put(lpath, rpath[, recursive]) Upload file from local
read_block(fn, offset, length[, delimiter]) Read a block of bytes from
rename(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
rm(path[, recursive, maxdepth]) Delete files.
rmdir(path) Remove a directory, if empty
size(path) Size in bytes of file
start_transaction() Begin write transaction for deferring files, non-context version
stat(path, **kwargs) Alias of FilesystemSpec.info.
tail(path[, size]) Get the last size bytes from file
touch(path[, truncate]) Create empty file, or update timestamp
ukey(path) Hash of file properties, to tell if it has changed
upload(lpath, rpath[, recursive]) Alias of FilesystemSpec.put.
walk(path[, maxdepth]) Return all files belows path
__init__(fo='', mode='r', **storage_options)[source]
Parameters:

fo: str or file-like

Contains ZIP, and must exist. If a str, will fetch file using open_files(), which must return one file exactly.

mode: str

Currently, only ‘r’ accepted

storage_options: key-value

May be credentials, e.g., {‘auth’: (‘username’, ‘pword’)} or any other parameters for requests

class fsspec.implementations.cached.CachingFileSystem(target_protocol=None, cache_storage='TMP', cache_check=10, check_files=False, expiry_time=604800, target_options=None, **kwargs)[source]

Locally caching filesystem, layer over any other FS

This class implements chunk-wise local storage of remote files, for quick access after the initial download. The files are stored in a given directory with random hashes for the filenames. If no directory is given, a temporary one is used, which should be cleaned up by the OS after the process ends. The files themselves as sparse (as implemented in MMapCache), so only the data which is accessed takes up space.

Restrictions:

  • the block-size must be the same for each access of a given file, unless all blocks of the file have already been read
  • caching can only be applied to file-systems which produce files derived from fsspec.spec.AbstractBufferedFile ; LocalFileSystem is also allowed, for testing

Attributes

transaction A context within which files are committed together upon exit

Methods

cat(path) Get the content of a file
checksum(path) Unique value for current version of file
clear_instance_cache() Clear the cache of filesystem instances.
close_and_update(f, close) Called when a file is closing, so store the set of blocks
copy(path1, path2, **kwargs) Copy within two locations in the filesystem
cp(path1, path2, **kwargs) Alias of FilesystemSpec.copy.
current() Return the most recently created FileSystem
delete(path[, recursive, maxdepth]) Alias of FilesystemSpec.rm.
disk_usage(path[, total, maxdepth]) Alias of FilesystemSpec.du.
download(rpath, lpath[, recursive]) Alias of FilesystemSpec.get.
du(path[, total, maxdepth]) Space used by files within a path
end_transaction() Finish write transaction, non-context version
exists(path) Is there a file at the given path
find(path[, maxdepth, withdirs]) List all files below path.
get(rpath, lpath[, recursive]) Copy file to local.
get_mapper(root[, check, create]) Create key/value store based on this file-system
glob(path, **kwargs) Find files by glob-matching.
head(path[, size]) Get the first size bytes from file
info(path, **kwargs) Give details of entry at path
invalidate_cache([path]) Discard any cached directory information
isdir(path) Is this entry directory-like?
isfile(path) Is this entry file-like?
listdir(path[, detail]) Alias of FilesystemSpec.ls.
load_cache() Read set of stored blocks from file
ls(path[, detail]) List objects at path.
makedir(path[, create_parents]) Alias of FilesystemSpec.mkdir.
makedirs(path[, exist_ok]) Recursively make directories
mkdir(path[, create_parents]) Create directory entry at path
mkdirs(path[, exist_ok]) Alias of FilesystemSpec.makedirs.
move(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
mv(path1, path2, **kwargs) Move file from one location to another
open(path[, mode, block_size, cache_options]) Return a file-like object from the filesystem
put(lpath, rpath[, recursive]) Upload file from local
read_block(fn, offset, length[, delimiter]) Read a block of bytes from
rename(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
rm(path[, recursive, maxdepth]) Delete files.
rmdir(path) Remove a directory, if empty
save_cache() Save set of stored blocks from file
size(path) Size in bytes of file
start_transaction() Begin write transaction for deferring files, non-context version
stat(path, **kwargs) Alias of FilesystemSpec.info.
tail(path[, size]) Get the last size bytes from file
touch(path[, truncate]) Create empty file, or update timestamp
ukey(path) Hash of file properties, to tell if it has changed
upload(lpath, rpath[, recursive]) Alias of FilesystemSpec.put.
walk(path[, maxdepth]) Return all files belows path
__init__(target_protocol=None, cache_storage='TMP', cache_check=10, check_files=False, expiry_time=604800, target_options=None, **kwargs)[source]
Parameters:

target_protocol: str

Target fielsystem protocol

cache_storage: str or list(str)

Location to store files. If “TMP”, this is a temporary directory, and will be cleaned up by the OS when this process ends (or later). If a list, each location will be tried in the order given, but only the last will be considered writable.

cache_check: int

Number of seconds between reload of cache metadata

check_files: bool

Whether to explicitly see if the UID of the remote file matches the stored one before using. Warning: some file systems such as HTTP cannot reliably give a unique hash of the contents of some path, so be sure to set this option to False.

expiry_time: int

The time in seconds after which a local copy is considered useless. Set to falsy to prevent expiry. The default is equivalent to one week.

target_options: dict or None

Passed to the instantiation of the FS, if fs is None.

class fsspec.implementations.cached.WholeFileCacheFileSystem(target_protocol=None, cache_storage='TMP', cache_check=10, check_files=False, expiry_time=604800, target_options=None, **kwargs)[source]

Caches whole remote files on first access

This class is intended as a layer over any other file system, and will make a local copy of each file accessed, so that all subsequent reads are local. This is similar to CachingFileSystem, but without the block-wise functionality and so can work even when sparse files are not allowed. See its docstring for definition of the init arguments.

The class still needs access to the remote store for listing files, and may refresh cached files.

Attributes

transaction A context within which files are committed together upon exit

Methods

cat(path) Get the content of a file
checksum(path) Unique value for current version of file
clear_instance_cache() Clear the cache of filesystem instances.
close_and_update(f, close) Called when a file is closing, so store the set of blocks
copy(path1, path2, **kwargs) Copy within two locations in the filesystem
cp(path1, path2, **kwargs) Alias of FilesystemSpec.copy.
current() Return the most recently created FileSystem
delete(path[, recursive, maxdepth]) Alias of FilesystemSpec.rm.
disk_usage(path[, total, maxdepth]) Alias of FilesystemSpec.du.
download(rpath, lpath[, recursive]) Alias of FilesystemSpec.get.
du(path[, total, maxdepth]) Space used by files within a path
end_transaction() Finish write transaction, non-context version
exists(path) Is there a file at the given path
find(path[, maxdepth, withdirs]) List all files below path.
get(rpath, lpath[, recursive]) Copy file to local.
get_mapper(root[, check, create]) Create key/value store based on this file-system
glob(path, **kwargs) Find files by glob-matching.
head(path[, size]) Get the first size bytes from file
info(path, **kwargs) Give details of entry at path
invalidate_cache([path]) Discard any cached directory information
isdir(path) Is this entry directory-like?
isfile(path) Is this entry file-like?
listdir(path[, detail]) Alias of FilesystemSpec.ls.
load_cache() Read set of stored blocks from file
ls(path[, detail]) List objects at path.
makedir(path[, create_parents]) Alias of FilesystemSpec.mkdir.
makedirs(path[, exist_ok]) Recursively make directories
mkdir(path[, create_parents]) Create directory entry at path
mkdirs(path[, exist_ok]) Alias of FilesystemSpec.makedirs.
move(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
mv(path1, path2, **kwargs) Move file from one location to another
open(path[, mode, block_size, cache_options]) Return a file-like object from the filesystem
put(lpath, rpath[, recursive]) Upload file from local
read_block(fn, offset, length[, delimiter]) Read a block of bytes from
rename(path1, path2, **kwargs) Alias of FilesystemSpec.mv.
rm(path[, recursive, maxdepth]) Delete files.
rmdir(path) Remove a directory, if empty
save_cache() Save set of stored blocks from file
size(path) Size in bytes of file
start_transaction() Begin write transaction for deferring files, non-context version
stat(path, **kwargs) Alias of FilesystemSpec.info.
tail(path[, size]) Get the last size bytes from file
touch(path[, truncate]) Create empty file, or update timestamp
ukey(path) Hash of file properties, to tell if it has changed
upload(lpath, rpath[, recursive]) Alias of FilesystemSpec.put.
walk(path[, maxdepth]) Return all files belows path

Read Buffering

fsspec.caching.ReadAheadCache(blocksize, …) Cache which reads only when we get beyond a block of data
fsspec.caching.BytesCache(blocksize, …[, trim]) Cache which holds data in a in-memory bytes object
fsspec.caching.MMapCache(blocksize, fetcher, …) memory-mapped sparse file cache
fsspec.caching.BlockCache(blocksize, …[, …]) Cache holding memory as a set of blocks.
class fsspec.caching.ReadAheadCache(blocksize, fetcher, size)[source]

Cache which reads only when we get beyond a block of data

This is a much simpler version of BytesCache, and does not attempt to fill holes in the cache or keep fragments alive. It is best suited to many small reads in a sequential order (e.g., reading lines from a file).

class fsspec.caching.BytesCache(blocksize, fetcher, size, trim=True)[source]

Cache which holds data in a in-memory bytes object

Implements read-ahead by the block size, for semi-random reads progressing through the file.

Parameters:

trim: bool

As we read more data, whether to discard the start of the buffer when we are more than a blocksize ahead of it.

class fsspec.caching.MMapCache(blocksize, fetcher, size, location=None, blocks=None)[source]

memory-mapped sparse file cache

Opens temporary file, which is filled blocks-wise when data is requested. Ensure there is enough disc space in the temporary location.

This cache method might only work on posix

class fsspec.caching.BlockCache(blocksize, fetcher, size, maxblocks=32)[source]

Cache holding memory as a set of blocks.

Requests are only ever made blocksize at a time, and are stored in an LRU cache. The least recently accessed block is discarded when more than maxblocks are stored.

Parameters:

blocksize : int

The number of bytes to store in each block. Requests are only ever made for blocksize, so this should balance the overhead of making a request against the granularity of the blocks.

fetcher : Callable

size : int

The total size of the file being cached.

maxblocks : int

The maximum number of blocks to cache for. The maximum memory use for this cache is then blocksize * maxblocks.

Methods

cache_info() The statistics on the block cache.
cache_info()[source]

The statistics on the block cache.

Returns:

NamedTuple

Returned directly from the LRU Cache used internally.