xphyle package

Public API

xphyle module

The main xphyle methods – xopen, popen, and open_.

class xphyle.BufferWrapper(fileobj: Union[os.PathLike, IO, io.IOBase], buffer: Union[_io.StringIO, _io.BytesIO], compression: Union[bool, str] = False, name: str = None, **kwargs)

Bases: xphyle.FileWrapper

Wrapper around a string/bytes buffer.

Parameters:
  • fileobj – The fileobj to wrap (the raw or wrapped buffer).
  • buffer – The raw buffer.
  • compression – Compression type.
  • close_fileobj – Whether to close the buffer when closing this wrapper.
getvalue() → Union[bytes, str]

Returns the contents of the buffer.

class xphyle.EventListener(**kwargs)

Bases: typing.Generic

Base class for listener events that can be registered on a FileLikeWrapper.

Parameters:kwargs – keyword arguments to pass through to execute
execute(wrapper: E, **kwargs) → None

Handle an event. This method must be implemented by subclasses.

Parameters:
  • wrapper – The EventManager on which this event was registered.
  • kwargs – A union of the keyword arguments passed to the constructor and the __call__ method.
class xphyle.EventManager

Bases: object

Mixin type for classes that allow registering event listners.

register_listener(event: Union[str, xphyle.types.EventType], listener: xphyle.EventListener) → None

Register an event listener.

Parameters:
  • event – Event name (currently, only ‘close’ is recognized)
  • listener – A listener object, which must be callable with a single argument – this file wrapper.
class xphyle.FileLikeWrapper(fileobj: Union[IO, io.IOBase], compression: Union[bool, str] = False, close_fileobj: bool = True)

Bases: xphyle.EventManager, xphyle.types.FileLikeBase

Base class for wrappers around file-like objects. By default, method calls are forwarded to the file object. Adds the following:

1. A simple event system by which registered listeners can respond to file events. Currently, ‘close’ is the only supported event 2. Wraps file iterators in a progress bar (if configured)

Parameters:
  • fileobj – The file-like object to wrap.
  • compression – Whether the wrapped file is compressed.
  • close_fileobj – Whether to close the wrapped file object when closing this wrapper.
close() → None

Close the file, close an open iterator, and fire ‘close’ events to any listeners.

closed
fileno() → int
flush() → None
isatty() → bool
mode
name
peek(size: int = 1) → Union[bytes, str]

Return bytes/characters from the stream without advancing the position. At most one single read on the raw stream is done to satisfy the call.

Parameters:size – The max number of bytes/characters to return.
Returns:At most size bytes/characters. Unlike io.BufferedReader.peek(), will never return more than size bytes/characters.

Notes

If the file uses multi-byte encoding and N characters are desired, it is up to the caller to request size=2N.

read(size: int = -1) → bytes
readable() → bool
readline(size: int = -1) → Union[bytes, str]
readlines(hint: int = -1) → List[Union[bytes, str]]
seek(offset, whence: int = 0) → int
seekable() → bool
tell() → int
truncate(size: int = None) → int
writable() → bool
write(string: Union[bytes, str]) → int
writelines(lines: Iterable[Union[bytes, str]]) → None
class xphyle.FileWrapper(source: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode] = 'w', compression: Union[bool, str] = False, name: Union[str, pathlib.PurePath] = None, close_fileobj: bool = True, **kwargs)

Bases: xphyle.FileLikeWrapper

Wrapper around a file object.

Parameters:
  • source – Path or file object.
  • mode – File open mode.
  • compression – Compression type.
  • name – Use an alternative name for the file.
  • kwargs – Additional arguments to pass to xopen.
name
path

The source path.

class xphyle.Process(args, stdin: Union[os.PathLike, IO, io.IOBase, int] = None, stdout: Union[os.PathLike, IO, io.IOBase, int] = None, stderr: Union[os.PathLike, IO, io.IOBase, int] = None, **kwargs)

Bases: subprocess.Popen, xphyle.EventManager, xphyle.types.FileLikeBase, typing.Iterable

Subclass of subprocess.Popen with the following additions:

(e.g. to send compressed data to a process’ stdin or read compressed data from its stdout/stderr). * Provides :method:`Process.close` for properly closing stdin/stdout/stderr streams and terminating the process. * Implements required methods to make objects ‘file-like’.

Parameters:
  • args – Positional arguments, passed to subprocess.Popen constructor.
  • stdout, stderr (stdin,) – Identical to the same arguments to subprocess.Popen.
  • kwargs – Keyword arguments, passed to subprocess.Popen constructor.
check_valid_returncode(valid: Container[int] = (0, None, <Signals.SIGPIPE: 13>, 141))

Check that the returncodes does not have a value associated with an error state.

Raises:
close() → None
close1(timeout: float = None, raise_on_error: bool = False, record_output: bool = False, terminate: bool = False) → Optional[int]

Close stdin/stdout/stderr streams, wait for process to finish, and return the process return code.

Parameters:
  • timeout – time in seconds to wait for stream to close; negative value or None waits indefinitely.
  • raise_on_error – Whether to raise an exception if the process returns an error.
  • record_output – Whether to store contents of stdout and stderr in place of the actual streams after closing them.
  • terminate – If True and timeout is a positive integer, the process is terminated if it doesn’t finish within timeout seconds.

Notes

If :attribute:`record_output` is True, and if stdout/stderr is a PIPE, any contents are read and stored as the value of :attribute:`stdout`/:attribute:`stderr`. Otherwise the data is lost.

Returns:The process returncode.
Raises:IOError if `raise_on_error` is True and the process returns an – error code.
closed

Whether the Process has been closed.

communicate(inp: Union[bytes, str] = None, timeout: float = None) → Tuple[IO, IO]

Send input to stdin, wait for process to terminate, return results.

Parameters:
  • inp – Input to send to stdin.
  • timeout – Time to wait for process to finish.
Returns:

Tuple of (stdout, stderr).

flush() → None

Flushes stdin if there is one.

get_reader(which: str = None) → Union[IO, io.IOBase]

Returns the stream for reading data from stdout/stderr.

Parameters:which – Which stream to read from, ‘stdout’ or ‘stderr’. If None, stdout is used if it exists, otherwise stderr.
Returns:The specified stream, or None if the stream doesn’t exist.
get_readers()

Returns (stdout, stderr) tuple.

get_writer() → Union[IO, io.IOBase]

Returns the stream for writing to stdin.

is_wrapped(name: str) → bool

Returns True if the stream corresponding to name is wrapped.

Parameters:name – One of ‘stdin’, ‘stdout’, ‘stderr’
mode
name
read(size: int = -1, which: str = None) → bytes

Read size bytes/characters from stdout or stderr.

Parameters:
  • size – Number of bytes/characters to read.
  • which – Which stream to read from, ‘stdout’ or ‘stderr’. If None, stdout is used if it exists, otherwise stderr.
Returns:

The bytes/characters read from the specified stream.

readable() → bool

Returns True if this Popen has stdout and/or stderr, otherwise False.

readline(hint: int = -1, which: str = None) → Union[bytes, str]
readlines(sizehint: int = -1, which: str = None) → List[Union[bytes, str]]
wrap_pipes(**kwargs) → None

Wrap stdin/stdout/stderr PIPE streams using xopen.

Parameters:kwargs – for each of ‘stdin’, ‘stdout’, ‘stderr’, a dict providing arguments to xopen describing how the stream should be wrapped.
writable() → bool

Returns True if this Popen has stdin, otherwise False.

write(data: Union[bytes, str]) → int

Write data to stdin.

Parameters:data – The data to write; must be bytes if stdin is a byte stream or string if stdin is a text stream.
Returns:Number of bytes/characters written
writelines(lines: Iterable[Union[bytes, str]]) → None
class xphyle.StdWrapper(stream: Union[IO, io.IOBase], compression: Union[bool, str] = False)

Bases: xphyle.FileLikeWrapper

Wrapper around stdin/stdout/stderr.

Parameters:
  • stream – The stream to wrap.
  • compression – Compression type.
closed
xphyle.configure(default_xopen_context_wrapper: Optional[bool] = None, progress: Optional[bool] = None, progress_wrapper: Optional[Callable[..., Iterable]] = None, system_progress: Optional[bool] = None, system_progress_wrapper: Union[str, Sequence[str], None] = None, threads: Optional[int] = None, executable_path: Union[pathlib.PurePath, Sequence[pathlib.PurePath], None] = None) → None

Conifgure xphyle.

Parameters:
  • default_xopen_context_wrapper – Whether to wrap files opened by :method:`xopen` in FileLikeWrapper`s by default (when `xopen’s context_wrapper parameter is None.
  • progress – Whether to wrap long-running operations with a progress bar
  • progress_wrapper – Specify a non-default progress wrapper
  • system_progress – Whether to use progress bars for system-level
  • system_progress_wrapper – Specify a non-default system progress wrapper
  • threads – The number of threads that can be used by compression formats that support parallel compression/decompression. Set to None or a number < 1 to automatically initalize to the number of cores on the local machine.
  • executable_path – List of paths where xphyle should look for system executables. These will be searched before the default system path.
xphyle.get_compressor(name_or_path: Union[str, pathlib.PurePath]) → Optional[xphyle.formats.CompressionFormat]

Returns the CompressionFormat for the given path or compression type name.

xphyle.guess_file_format(path: pathlib.PurePath) → str

Try to guess the file format, first from the extension, and then from the header bytes.

Parameters:path – The path to the file
Returns:The v format, or None if one could not be determined
xphyle.open_(target: Union[os.PathLike, IO, io.IOBase, bytes, str, Type[Union[bytes, str]]], mode: Union[str, xphyle.types.FileMode] = None, errors: bool = True, wrap_fileobj: bool = True, **kwargs) → Generator[[Union[IO, io.IOBase], None], None]

Context manager that frees you from checking if an argument is a path or a file object. Calls xopen to open files.

Parameters:
  • target – A relative or absolute path, a URL, a system command, a file-like object, or bytes or str to indicate a writeable byte/string buffer.
  • mode – The file open mode.
  • errors – Whether to raise an error if there is a problem opening the file. If False, yields None when there is an error.
  • wrap_fileobj – If path_or_file is a file-likek object, this parameter determines whether it will be passed to xopen for wrapping (True) or returned directly (False). If False, any kwargs are ignored.
  • kwargs – Additional args to pass through to xopen (if f is a path).
Yields:

A file-like object, or None if errors is False and there is a problem opening the file.

Examples

with open_(‘myfile’) as infile:
print(next(infile))

fileobj = open(‘myfile’) with open_(fileobj) as infile:

print(next(infile))
xphyle.popen(args: Iterable, stdin: Union[os.PathLike, IO, io.IOBase, int, dict, Tuple[Union[os.PathLike, IO, io.IOBase, int], Union[str, xphyle.types.FileMode, dict]]] = None, stdout: Union[os.PathLike, IO, io.IOBase, int, dict, Tuple[Union[os.PathLike, IO, io.IOBase, int], Union[str, xphyle.types.FileMode, dict]]] = None, stderr: Union[os.PathLike, IO, io.IOBase, int, dict, Tuple[Union[os.PathLike, IO, io.IOBase, int], Union[str, xphyle.types.FileMode, dict]]] = None, shell: bool = False, **kwargs) → xphyle.Process

Opens a subprocess, using xopen to open input/output streams.

Parameters:
  • args – argument string or tuple of arguments.
  • stdin
  • stdout
  • stderr – file to use as stdin, PIPE to open a pipe, a dict to pass xopen args for a PIPE, a tuple of (path, mode) or a tuple of (path, dict), where the dict contains parameters to pass to xopen.
  • shell – The ‘shell’ arg from subprocess.Popen.
  • kwargs – additional arguments to subprocess.Popen.
Returns:

A Process object, which is a subclass of subprocess.Popen.

xphyle.xopen(target: Union[os.PathLike, IO, io.IOBase, bytes, str, Type[Union[bytes, str]]], mode: Union[str, xphyle.types.FileMode] = None, compression: Union[bool, str] = None, use_system: bool = True, allow_subprocesses: bool = True, context_wrapper: bool = None, file_type: xphyle.types.FileType = None, validate: bool = True, overwrite: bool = True, close_fileobj: bool = True, **kwargs) → Union[IO, io.IOBase]

Replacement for the builtin open function that can also open URLs and subprocessess, and automatically handles compressed files.

Parameters:
  • target – A relative or absolute path, a URL, a system command, a file-like object, or bytes or str to indicate a writeable byte/string buffer.
  • mode – Some combination of the access mode (‘r’, ‘w’, ‘a’, or ‘x’) and the open mode (‘b’ or ‘t’). If the later is not given, ‘t’ is used by default.
  • compression – If None or True, compression type (if any) will be determined automatically. If False, no attempt will be made to determine compression type. Otherwise this must specify the compression type (e.g. ‘gz’). See xphyle.compression for details. Note that compression will not be guessed for ‘-‘ (stdin).
  • use_system – Whether to attempt to use system-level compression programs.
  • allow_subprocesses – Whether to allow path to be a subprocess (e.g. ‘|cat’). There are security risks associated with allowing users to run arbitrary system commands.
  • context_wrapper – If True, the file is wrapped in a FileLikeWrapper subclass before returning (FileWrapper for files/URLs, StdWrapper for STDIN/STDOUT/STDERR). If None, the default value (set using :method:`configure`) is used.
  • file_type – a FileType; explicitly specify the file type. By default the file type is detected, but auto-detection might make mistakes, e.g. a local file contains a colon (‘:’) in the name.
  • validate – Ensure that the user-specified compression format matches the format guessed from the file extension or magic bytes.
  • overwrite – For files opened in write mode, whether to overwrite existing files (True).
  • close_fileobj – When path is a file-like object / file_type is FileType.FILELIKE, and context_wrapper is True, whether to close the underlying file when closing the wrapper.
  • kwargs – Additional keyword arguments to pass to open.
path is interpreted as follows:
  • If starts with ‘|’, it is assumed to be a system command
  • If a file-like object, it is used as-is
  • If one of STDIN, STDOUT, STDERR, the appropriate sys stream is used
  • If parseable by xphyle.urls.parse_url(), it is assumed to be a URL
  • If file_type == FileType.BUFFER and path is a string or bytes and mode is readable, a new StringIO/BytesIO is created with ‘path’ passed to its constructor.
  • Otherwise it is assumed to be a local file

If use_system is True and the file is compressed, the file is opened with a pipe to the system-level compression program (e.g. gzip for ‘.gz’ files) if possible, otherwise the corresponding python library is used.

Returns:

A Process if file_type is PROCESS, or if file_type is None and path starts with ‘|’. Otherwise, an opened file-like object. If context_wrapper is True, this will be a subclass of FileLikeWrapper.

Raises:

ValueError if – * compression is True and compression format cannot be determined * the specified compression format is invalid * validate is True and the specified compression format is not

the acutal format of the file

  • the path or mode are invalid

xphyle.utils module

A collection of convenience methods for reading, writing, and otherwise managing files. All of these functions are ‘safe’, meaning that if you pass errors=False and there is a problem opening the file, the error will be handled gracefully.

class xphyle.utils.CompressOnClose(**kwargs)

Bases: xphyle.EventListener

Compress a file after it is closed.

compressed_path = None
execute(wrapper: xphyle.FileWrapper, **kwargs) → None

Handle an event. This method must be implemented by subclasses.

Parameters:
  • wrapper – The EventManager on which this event was registered.
  • kwargs – A union of the keyword arguments passed to the constructor and the __call__ method.
class xphyle.utils.CycleFileOutput(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, char_mode: CharMode = None, **kwargs)

Bases: xphyle.utils.FileOutput

Alternate each line between files.

Parameters:
  • files – A list of files.
  • char_mode – The character mode.
class xphyle.utils.FileInput(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, char_mode: CharMode = None)

Bases: xphyle.utils.FileManager, typing.Iterator

Similar to python’s :module:`fileinput` that uses xopen to open files. Currently only supports sequential line-oriented access via next or readline.

Parameters:
  • files – List of files.
  • char_mode – text or binary mode.

Notes

Default values are not allowed for generically typed parameters. In a future version, char_mode will default to None and it will be required to specify the mode, or use one of the convenience methods (:method:`textinput` or :method:`byteinput`).

add(path_or_file: Union[os.PathLike, IO, io.IOBase], key: Optional[Any] = None, **kwargs) → None

Overrides FileManager.add() to prevent file-specific open args.

filekey

The key of the file currently being read.

filename

The name of the file currently being read.

finished

Whether all data has been read from all files.

lineno

The total number of lines that have been read so far from all files.

readline() → CharMode

Read the next line from the current file (advancing to the next file if necessary and possible).

Returns:The next line, or the undefined string if self.finished==True.
class xphyle.utils.FileManager(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, header=None, **kwargs)

Bases: collections.abc.Sized

Dict-like container for files. Files are opened lazily (upon first request) using xopen.

Parameters:
  • files – An iterable of files to add. Each item can either be a string path or a (key, fileobj) tuple.
  • header – A header to write when opening writable files.
  • kwargs – Default arguments to pass to xopen.
add(path_or_file: Union[os.PathLike, IO, io.IOBase], key: Optional[Any] = None, **kwargs) → None

Add a file.

Parameters:
  • path_or_file – Path or file object. If this is a path, the file will be opened with the specified mode.
  • key – Dict key. Defaults to the file name.
  • kwargs – Arguments to pass to xopen. These override any keyword arguments passed to the FileManager’s constructor.
add_all(files: Union[Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], Dict[Any, Union[os.PathLike, IO, io.IOBase]]], **kwargs) → None

Add all files from an iterable or dict.

Parameters:
  • files – An iterable or dict of files to add. If an iterable, each item can either be a string path or a (key, fileobj) tuple.
  • kwargs – Additional arguments to pass to add.
close() → None

Close all files being tracked.

get(key: Any) → Union[IO, io.IOBase, None]

Get the file object associated with a path. If the file is not already open, it is first opened with xopen.

Parameters:key – The file name/key.
Returns:The opened file.
get_path(key: Any) → pathlib.PurePath

Returns the file path associated with a key.

Parameters:key – The key to resolve.
Returns:The file path.
iter_files() → Generator[[Tuple[Any, Union[IO, io.IOBase]], None], None]

Iterates over all (key, file) pairs in the order they were added.

keys

Returns a list of all keys in the order they were added.

paths

Returns a list of all paths in the order they were added.

xphyle.utils.FileOrFilesArg = typing.Union[os.PathLike, typing.IO, io.IOBase, typing.Iterable[typing.Union[os.PathLike, typing.IO, io.IOBase, typing.Tuple[typing.Any, typing.Union[os.PathLike, typing.IO, io.IOBase]]]], NoneType]

A path or multiple files.

class xphyle.utils.FileOutput(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, access: Union[str, xphyle.types.ModeAccess] = 'w', char_mode: Optional[CharMode] = None, linesep: Optional[CharMode] = None, encoding: str = 'utf-8', header: Optional[CharMode] = None)

Bases: xphyle.utils.FileManager, typing.Generic

Base class for file manager that writes to multiple files.

Parameters:
  • files – The list of files to open.
  • char_mode – The CharMode.
  • access – How to open the output files (‘w’, ‘a’, ‘x’).
  • linesep – The line separator (type must match char_mode).
  • encoding – Default character encoding to use.
  • header – Default file header to write when opening output files.

Notes

Default values for generically typed parameters are not allowed. In a future version, char_mode and linesep will default to None and must be explicitly defined.

write(data: Any, detect_newlines: bool = True) → int

Writes data to the output.

Parameters:
  • data – The data to write; will be converted to string/bytes.
  • detect_newlines – If True, data is split on linesep and the resulting lines are written using :method:`writelines`, otherwise data is writen using :method:`writeline`.
Returns:

The number of characters written.

writeline(line: Union[bytes, str, None] = None) → Tuple[int, int]

Write a line to the output(s).

Parameters:line – The line to write.
Returns:The tuple (lines_written, chars_written).
writelines(lines: Iterable[Union[bytes, str]]) → Tuple[int, int]

Write an iterable of lines to the output(s).

Parameters:lines – An iterable of lines to write.
Returns:The tuple (lines_written, chars_written).
xphyle.utils.FilesArg

alias of typing.Iterable

class xphyle.utils.MoveOnClose(**kwargs)

Bases: xphyle.EventListener

Move a file after it is closed.

execute(wrapper: xphyle.FileWrapper, dest: pathlib.PurePath = None, **kwargs) → None

Handle an event. This method must be implemented by subclasses.

Parameters:
  • wrapper – The EventManager on which this event was registered.
  • kwargs – A union of the keyword arguments passed to the constructor and the __call__ method.
class xphyle.utils.NCycleFileOutput(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, char_mode: CharMode = None, lines_per_file: int = 1, **kwargs)

Bases: xphyle.utils.FileOutput

Alternate output lines between files.

Parameters:
  • files – A list of files.
  • char_mode – The character mode.
  • num_lines – How many lines to write to a file before moving on to the next file.
class xphyle.utils.PatternFileOutput(filename_pattern: Optional[str] = None, char_mode: Optional[CharMode] = None, token_func: Callable[Union[bytes, str], Dict[Union[bytes, str], Any]] = <function PatternFileOutput.<lambda>>, **kwargs)

Bases: xphyle.utils.TokenFileOutput

Use a callable to generate filenames based on data in lines.

Parameters:
  • filename_pattern – The pattern of file names to create. Should have a single token (‘{}’ or ‘{0}’) that is replaced with the file index.
  • char_mode – The character mode.
  • token_func – Function to extract token(s) from lines in file. By default this is the identity function, which is almost never what you want.
  • kwargs – Additional args.
xphyle.utils.PatternOrFileOrFilesArg = typing.Union[str, os.PathLike, typing.IO, io.IOBase, typing.Iterable[typing.Union[os.PathLike, typing.IO, io.IOBase, typing.Tuple[typing.Any, typing.Union[os.PathLike, typing.IO, io.IOBase]]]], NoneType]

A pattern, path, file, or multiple files.

class xphyle.utils.RemoveOnClose(**kwargs)

Bases: xphyle.EventListener

Remove a file after it is closed.

execute(wrapper: xphyle.FileWrapper, **kwargs) → None

Handle an event. This method must be implemented by subclasses.

Parameters:
  • wrapper – The EventManager on which this event was registered.
  • kwargs – A union of the keyword arguments passed to the constructor and the __call__ method.
class xphyle.utils.RollingFileOutput(filename_pattern: Union[str, Iterable[str]] = None, char_mode: CharMode = None, lines_per_file: int = 1, **kwargs)

Bases: xphyle.utils.TokenFileOutput

Write up to num_lines lines to a file before opening the next file. File names are created from a pattern.

Parameters:
  • filename_pattern – The pattern of file names to create. Should have a single token (‘{}’ or ‘{0}’) that is replaced with the file index.
  • char_mode – The character mode.
  • num_lines – The max number of lines to write to each file.
  • kwargs – Additional args.
class xphyle.utils.TeeFileOutput(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, access: Union[str, xphyle.types.ModeAccess] = 'w', char_mode: Optional[CharMode] = None, linesep: Optional[CharMode] = None, encoding: str = 'utf-8', header: Optional[CharMode] = None)

Bases: xphyle.utils.FileOutput

Write output to mutliple files simultaneously.

class xphyle.utils.TokenFileOutput(filename_pattern: Optional[str] = None, char_mode: Optional[CharMode] = None, **kwargs)

Bases: xphyle.utils.FileOutput

Generate file names according to a pattern.

Parameters:
  • filename_pattern – The pattern of file names to create. Should have a single token (‘{}’ or ‘{0}’) that is replaced with the file index.
  • char_mode – The character mode.
  • kwargs – Additional args.
xphyle.utils.byteinput(files: Union[os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None)

Convenience method that creates a new FileInput in bytes mode.

Parameters:files – The files to open. If None, files passed on the command line are used, or STDIN if there are no command line arguments.
Returns:A FileInput[bytes] instance.
xphyle.utils.byteoutput(files: Union[os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None, file_output_type: Callable[..., xphyle.utils.FileOutput[bytes]] = xphyle.utils.TeeFileOutput[bytes], **kwargs) → xphyle.utils.FileOutput[bytes]

Convenience function to create a fileoutput in bytes mode.

Parameters:
  • files – The files to write to.
  • file_output_type – The specific subclass of FileOutput to create.
  • kwargs – additional arguments to pass to the FileOutput constructor.
Returns:

A FileOutput instance.

xphyle.utils.compress_file(source_file: Union[os.PathLike, IO, io.IOBase], compressed_file: Union[os.PathLike, IO, io.IOBase] = None, compression: Union[bool, str] = None, keep: bool = True, compresslevel: int = None, use_system: bool = True, **kwargs) → pathlib.Path

Compress an existing file, either in-place or to a separate file.

Parameters:
  • source_file – Path or file-like object to compress.
  • compressed_file – The compressed path or file-like object. If None, compression is performed in-place. If True, file name is determined from source_file and the decompressed file is retained.
  • compression – If True, guess compression format from the file name, otherwise the name of any supported compression format.
  • keep – Whether to keep the source file.
  • compresslevel – Compression level.
  • use_system – Whether to try to use system-level compression.
  • kwargs – Additional arguments to pass to the open method when opening the compressed file.
Returns:

The path to the compressed file.

xphyle.utils.decompress_file(compressed_file: Union[os.PathLike, IO, io.IOBase], dest_file: Union[os.PathLike, IO, io.IOBase] = None, compression: Union[bool, str] = None, keep: bool = True, use_system: bool = True, **kwargs) → pathlib.Path

Decompress an existing file, either in-place or to a separate file.

Parameters:
  • compressed_file – Path or file-like object to decompress.
  • dest_file – Path or file-like object for the decompressed file. If None, file will be decompressed in-place. If True, file will be decompressed to a new file (and the compressed file retained) whose name is determined automatically.
  • compression – None or True, to guess compression format from the file name, or the name of any supported compression format.
  • keep – Whether to keep the source file.
  • use_system – Whether to try to use system-level compression
  • kwargs – Additional arguments to pass to the open method when opening the compressed file.
Returns:

The path of the decompressed file.

xphyle.utils.exec_process(*args, inp: Union[bytes, str] = None, timeout: int = None, **kwargs) → xphyle.Process

Shortcut to execute a process, wait for it to terminate, and return the results.

Parameters:
  • args – Positional arguments to popen.
  • inp – String/bytes to write to process input stream.
  • timeout – Time to wait for process to complete.
  • kwargs – Keyword arguments to popen.
Returns:

A terminated Process. The contents of stdout and stderr are recorded in the stdout and stderr attributes.

xphyle.utils.fileinput(files: Union[os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None, char_mode: CharMode = None) → xphyle.utils.FileInput[CharMode]

Convenience method that creates a new FileInput.

Parameters:
  • files – The files to open. If None, files passed on the command line are used, or STDIN if there are no command line arguments.
  • char_mode – The default read mode (‘t’ for text or b’b’ for binary).
Returns:

A FileInput instance.

Notes

Default values are not allowed for generically typed parameters. Use :method:`textinput` or :method:`byteinput` instead.

xphyle.utils.fileoutput(files: Union[str, os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None, char_mode: CharMode = None, linesep: CharMode = None, encoding: str = 'utf-8', file_output_type: Callable[..., xphyle.utils.FileOutput[CharMode]] = xphyle.utils.TeeFileOutput[~CharMode], **kwargs) → xphyle.utils.FileOutput[CharMode]

Convenience function to create a fileoutput.

Parameters:
  • files – The files to write to. Can include ‘-‘/’_’ for stdout/stderr.
  • char_mode – The write mode (‘t’ or b’b’).
  • linesep – The separator to use when writing lines.
  • encoding – The default file encoding to use.
  • file_output_type – The specific subclass of FileOutput to create.
  • kwargs – additional arguments to pass to the FileOutput constructor.
Returns:

A FileOutput instance.

Notes

Default values are not allowed for generically typed parameters. Use :method:`textoutput` or :method:`byteoutput` instead.

xphyle.utils.linecount(path_or_file: Union[os.PathLike, IO, io.IOBase], linesep: Optional[bytes] = None, buffer_size: int = 1048576, **kwargs) → int

Fastest pythonic way to count the lines in a file.

Parameters:
  • path_or_file – File object, or path to the file.
  • linesep – Line delimiter, specified as a byte string (e.g. b’n’).
  • buffer_size – How many bytes to read at a time (1 Mb by default).
  • kwargs – Additional arguments to pass to the file open method.
Returns:

The number of lines in the file. Blank lines (including the last line in the file) are included.

xphyle.utils.read_bytes(path_or_file: Union[os.PathLike, IO, io.IOBase], chunksize: int = 1024, **kwargs) → Generator[[bytes, None], None]

Iterate over a file in chunks. The mode will always be overridden to ‘rb’.

Parameters:
  • path_or_file – Path to the file, or a file-like object.
  • chunksize – Number of bytes to read at a time.
  • kwargs – Additional arguments to pass top :method:`xphyle.open_`.
Yields:

Chunks of the input file as bytes. Each chunk except the last should be of size chunksize.

xphyle.utils.read_delimited()

Iterate over rows in a delimited file.

Parameters:
  • path_or_file – Path to the file, or a file-like object.
  • sep – The field delimiter.
  • header – Either True or False to specifiy whether the file has a header, or a sequence of column names.
  • converters – callable, or iterable of callables, to call on each value.
  • yield_header – If header == True, whether the first row yielded should be the header row.
  • row_type – The collection type to return for each row: tuple, list, or dict.
  • kwargs – additional arguments to pass to csv.reader.
Yields:

Rows of the delimited file. If header==True, the first row yielded is the header row, and its type is always a list. Converters are not applied to the header row.

xphyle.utils.read_delimited_as_dict(path_or_file: Union[os.PathLike, IO, io.IOBase], sep: str = '\t', header: Union[bool, Sequence[str]] = False, key: Union[int, str, Callable[Sequence[str], Any]] = 0, **kwargs) → Dict[Any, Any]

Parse rows in a delimited file and add rows to a dict based on a a specified key index or function.

Parameters:
  • path_or_file – Path to the file, or a file-like object.
  • sep – Field delimiter.
  • header – If True, read the header from the first line of the file, otherwise a list of column names.
  • key – The column to use as a dict key, or a function to extract the key from the row. If a string value, header must be specified. All values must be unique, or an exception is raised.
  • kwargs – Additional arguments to pass to read_delimited.
Returns:

A dict with as many element as rows in the file.

Raises:

Exception if a duplicte key is generated.

xphyle.utils.read_dict(path_or_file: Union[os.PathLike, IO, io.IOBase], sep: str = '=', convert: Optional[Callable[str, Any]] = None, ordered: bool = False, **kwargs) → Dict[str, Any]

Read lines from simple property file (key=value). Comment lines (starting with ‘#’) are ignored.

Parameters:
  • path_or_file – Property file, or a list of properties.
  • sep – Key-value delimiter (defaults to ‘=’).
  • convert – Function to call on each value.
  • ordered – Whether to return an OrderedDict.
  • kwargs – Additional arguments to pass top :method:`xphyle.open_.
Returns:

An OrderedDict, if ‘ordered’ is True, otherwise a dict.

xphyle.utils.read_lines(path_or_file: Union[os.PathLike, IO, io.IOBase], convert: Optional[Callable[str, Any]] = None, strip_linesep: bool = True, **kwargs) → Generator[[str, None], None]

Iterate over lines in a file.

Parameters:
  • path_or_file – Path to the file, or a file-like object.
  • convert – Function to call on each line in the file.
  • strip_linesep – Whether to strip off trailing line separators.
  • kwargs – Additional arguments to pass to :method:`xphyle.open_`.
Yields:

Lines of a file, with line endings stripped.

xphyle.utils.textinput(files: Union[os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None)

Convenience method that creates a new FileInput in text mode.

Parameters:files – The files to open. If None, files passed on the command line are used, or STDIN if there are no command line arguments.
Returns:A FileInput[Text] instance.
xphyle.utils.textoutput(files: Union[os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None, file_output_type: Callable[..., xphyle.utils.FileOutput[str]] = xphyle.utils.TeeFileOutput[str], **kwargs) → xphyle.utils.FileOutput[str]

Convenience function to create a fileoutput in text mode.

Parameters:
  • files – The files to write to.
  • file_output_type – The specific subclass of FileOutput to create.
  • kwargs – additional arguments to pass to the FileOutput constructor.
Returns:

A FileOutput instance.

xphyle.utils.to_bytes(value: Any, encoding: str = 'utf-8')

Convert an arbitrary value to bytes.

Parameters:
  • value – Some value.
  • encoding – The byte encoding to use.
Returns:

x converted to a string and then encoded as bytes.

xphyle.utils.transcode_file(source_file: Union[os.PathLike, IO, io.IOBase], dest_file: Union[os.PathLike, IO, io.IOBase], source_compression: Union[bool, str] = True, dest_compression: Union[bool, str] = True, use_system: bool = True, source_open_args: Optional[dict] = None, dest_open_args: Optional[dict] = None) → None

Convert from one file format to another.

Parameters:
  • source_file – The path or file-like object to read from. If a file, it must be opened in mode ‘rb’.
  • dest_file – The path or file-like object to write to. If a file, it must be opened in binary mode.
  • source_compression – The compression type of the source file. If True, guess compression format from the file name, otherwise the name of any supported compression format.
  • dest_compression – The compression type of the dest file. If True, guess compression format from the file name, otherwise the name of any supported compression format.
  • use_system – Whether to use system-level compression.
  • source_open_args – Additional arguments to pass to xopen for the source file.
  • dest_open_args – Additional arguments to pass to xopen for the destination file.
xphyle.utils.uncompressed_size(path: pathlib.PurePath, compression: Union[bool, str] = None) → Optional[int]

Get the uncompressed size of the compressed file.

Parameters:
  • path – The path to the compressed file.
  • compression – None or True, to guess compression format from the file name, or the name of any supported compression format.
Returns:

The uncompressed size of the file in bytes, or None if the uncompressed size could not be determined (without actually decompressing the file).

Raises:

ValueError if the compression format is not supported.

xphyle.utils.write_bytes(iterable: Iterable, path_or_file: Union[os.PathLike, IO, io.IOBase], sep: Optional[bytes] = b'', convert: Callable[Any, bytes] = <function to_bytes>, **kwargs) → int

Write an iterable of bytes to a file.

Parameters:
  • iterable – An iterable.
  • path_or_file – Path to the file, or a file-like object.
  • sep – Separator between items.
  • convert – Function that converts a value to bytes.
  • kwargs – Additional arguments to pass top :method:`xphyle.open_`.
Returns:

Total number of bytes written, or -1 if errors=False and there was a problem opening the file.

xphyle.utils.write_dict(dictobj: Dict[str, Any], path_or_file: Union[os.PathLike, IO, io.IOBase], sep: str = '=', linesep: Optional[str] = '\n', convert: Callable[Any, str] = <class 'str'>, **kwargs) → int

Write a dict to a file as name=value lines.

Parameters:
  • dictobj – The dict (or dict-like object).
  • path_or_file – Path to the file, or a file-like object.
  • sep – The delimiter between key and value (defaults to ‘=’).
  • linesep – The delimiter between values, or os.linesep if None (defaults to ‘n’).
  • convert – Function that converts a value to a string.
Returns:

Total number of bytes written, or -1 if errors=False and there was a problem opening the file.

xphyle.utils.write_lines(iterable: Iterable[str], path_or_file: Union[os.PathLike, IO, io.IOBase], linesep: Optional[str] = '\n', convert: Callable[Any, str] = <class 'str'>, **kwargs) → int

Write delimiter-separated strings to a file.

Parameters:
  • iterable – An iterable.
  • path_or_file – Path to the file, or a file-like object.
  • linesep – The delimiter to use to separate the strings, or os.linesep if None (defaults to ‘n’).
  • convert – Function that converts a value to a string.
  • kwargs – Additional arguments to pass top :method:`xphyle.open_`.
Returns:

Total number of bytes written, or -1 if errors=False and there was a problem opening the file.

xphyle.paths module

Convenience functions for working with file paths.

Stdin, stdout, and stderr are treated as acceptable paths in most cases, which is why the PurePath type (Union[str, os.PurePath]) is used. String paths are still accepted as inputs, but all outputs will subclasses of os.PurePath.

xphyle.paths.BACKCOMPAT = True

Whether backward compatibility is enabled. By default, backward compatibility is enabled unless environment variable XPHYLE_BACKCOMPAT is set to ‘0’.

class xphyle.paths.DirSpec(*path_vars, template: str = None, pattern: Union[str, Pattern[~AnyStr]] = None)

Bases: xphyle.paths.SpecBase

Spec for the directory part of a path.

default_pattern

The default filename pattern.

default_search_root() → pathlib.PurePath

Get the default root directory for searcing.

default_var_name

The default variable name used for string formatting.

path_part(path: pathlib.Path) → str

Return the part of the absolute path corresponding to the spec type.

path_type

The PathType.

xphyle.paths.EXECUTABLE_CACHE = <xphyle.paths.ExecutableCache object>

Singleton instance of ExecutableCache.

class xphyle.paths.ExecutableCache(default_path: Optional[Iterable[pathlib.PurePath]] = None)

Bases: object

Lookup and cache executable paths.

Parameters:default_path – The default executable path
add_search_path(paths: Union[str, pathlib.PurePath, Iterable[pathlib.PurePath]]) → None

Add directories to the beginning of the executable search path.

Parameters:paths – List of paths, or a string with directories separated by os.pathsep.
get_path(executable: Union[str, pathlib.PurePath]) → pathlib.Path

Get the full path of executable.

Parameters:executable – A executable name or path.
Returns:The full path of executable, or None if the path cannot be found.
reset_search_path(default_path: Iterable[pathlib.PurePath] = None) → None

Reset the search path to default_path.

Parameters:default_path – The default executable path.
resolve_exe(names: Iterable[str]) → Optional[Tuple[pathlib.Path, str]]

Given an iterable of command names, find the first that resolves to an executable.

Parameters:names – An iterable of command names.
Returns:A tuple (path, name) of the first command to resolve, or None if none of the commands resolve.
class xphyle.paths.FileSpec(*path_vars, template: str = None, pattern: Union[str, Pattern[~AnyStr]] = None)

Bases: xphyle.paths.SpecBase

Spec for the filename part of a path.

Examples

spec = FileSpec(
PathVar(‘id’, pattern=’[A-Z0-9_]+’), PathVar(‘ext’, pattern=r’[^.]+’), template=’{id}.{ext}’

)

# get a single file path = spec(id=’ABC123’, ext=’txt’) # => PathInst(‘ABC123.txt’) print(path[‘id’]) # => ‘ABC123’

# get the variable values for a path path = spec.parse(‘ABC123.txt’) print(path[‘id’]) # => ‘ABC123’

# find all files that match a FileSpec in the user’s home directory all_paths = spec.find(‘~’) # => [PathInst…]

default_pattern

The default filename pattern.

default_var_name

The default variable name used for string formatting.

path_part(path: pathlib.Path) → str

Return the part of the absolute path corresponding to the spec type.

path_type

The PathType.

class xphyle.paths.PathInst

Bases: pathlib.PosixPath

A path-like that has a slot for variable values.

joinpath(*other) → xphyle.paths.PathInst

Join two path-like objects, including merging ‘values’ dicts.

values
class xphyle.paths.PathPathVar(name: str, undefined: pathlib.PurePath = PosixPath('.'), datatype: Callable[str, pathlib.Path] = <class 'pathlib.Path'>, **kwargs)

Bases: xphyle.paths.PathVar

class xphyle.paths.PathSpec(dir_spec: Union[pathlib.PurePath, xphyle.paths.DirSpec], file_spec: Union[str, xphyle.paths.FileSpec])

Bases: object

Specifies a path in terms of a template with named components (“path variables”).

Parameters:
  • dir_spec – A PurePath if the directory is fixed, otherwise a DirSpec.
  • file_spec – A string if the filename is fixed, otherwise a FileSpec.
construct(**kwargs) → xphyle.paths.PathInst

Create a new PathInst from this PathSpec using values in kwargs.

Parameters:kwargs – Specify values for path variables.
Returns:A PathInst
find(root: Optional[pathlib.PurePath] = None, path_types: Sequence[Union[str, xphyle.types.PathType]] = 'f', recursive: bool = False) → Sequence[xphyle.paths.PathInst]

Find all paths matching this PathSpec. The search starts in ‘root’ if it is not None, otherwise it starts in the deepest fixed directory of this PathSpec’s DirSpec.

Parameters:
  • root – Directory in which to begin the search.
  • path_types – Types to return – files (‘f’), directories (‘d’) or both (‘fd’).
  • recursive – Whether to search recursively.
Returns:

A sequence of PathInst.

parse(path: pathlib.PurePath) → xphyle.paths.PathInst

Extract PathVar values from path and create a new PathInst.

Parameters:path – The path to parse

Returns: a PathInst

class xphyle.paths.PathVar(name: str, optional: bool = False, default: Optional[T] = None, undefined: T = None, pattern: Union[str, Pattern[~AnyStr]] = None, valid: Iterable[T] = None, invalid: Iterable[T] = None, datatype: Callable[str, T] = None)

Bases: typing.Generic

Describes part of a path, used in PathSpec.

Parameters:
  • name – Path variable name
  • optional – Whether this part of the path is optional
  • default – A default value for this path variable
  • undefined – The value to use when the variable is undefined
  • pattern – A pattern that the value must match
  • valid – Iterable of valid values
  • invalid – Iterable of invalid values

If valid is specified, invalid and pattern are ignored. Otherwise, values are first checked against pattern (if one is specified), then checked against invalid (if specified).

as_pattern() → str

Format this variable as a regular expression capture group.

xphyle.paths.STDERR = PurePosixPath('/dev/stderr')

Placeholder for sys.stderr

xphyle.paths.STDERR_STR = '_'

String placeholder for stderr.

xphyle.paths.STDIN = PurePosixPath('/dev/stdin')

Placeholder for sys.stdin.

xphyle.paths.STDIN_OR_STDOUT = PurePosixPath('-')

Placeholder for stdin or stdout, when the access mode is not known.

xphyle.paths.STDIN_OR_STDOUT_STR = '-'

String placeholder for stdin/stdout.

xphyle.paths.STDOUT = PurePosixPath('/dev/stdout')

Placeholder for or sys.stdout.

class xphyle.paths.SpecBase(*path_vars, template: str = None, pattern: Union[str, Pattern[~AnyStr]] = None)

Bases: object

Base class for DirSpec and FileSpec.

Parameters:
  • path_vars – Named variables with which to associate parts of a path.
  • template – Format string for creating paths from variables.
  • pattern – Regular expression for identifying matching paths.
construct(**kwargs) → xphyle.paths.PathInst

Create a new PathInst from this spec using values in kwargs.

Parameters:kwargs – Specify values for path variables.
Returns:A PathInst.
default_pattern

The default filename pattern.

default_search_root() → pathlib.PurePath

Get the default root directory for searcing.

default_var_name

The default variable name used for string formatting.

find(root: Optional[pathlib.PurePath] = None, recursive: bool = False) → Sequence[xphyle.paths.PathInst]

Find all paths in root matching this spec.

Parameters:
  • root – Directory in which to begin the search.
  • recursive – Whether to search recursively.
Returns:

A sequence of PathInst.

parse(path: Union[str, pathlib.PurePath], fullpath: bool = False) → xphyle.paths.PathInst

Extract PathVar values from path and create a new PathInst.

Parameters:
  • path – The path to parse.
  • fullpath – Whether to extract the fully-resolved path.

Returns: a PathInst.

path_part(path: pathlib.Path) → str

Return the part of the absolute path corresponding to the spec type.

path_type

The PathType.

class xphyle.paths.StrPathVar(name: str, undefined: str = '', **kwargs)

Bases: xphyle.paths.PathVar

class xphyle.paths.TempDir(permissions: Union[xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]], None] = 'rwx', path_descriptors: Iterable[xphyle.paths.TempPathDescriptor] = None, **kwargs)

Bases: xphyle.paths.TempPathManager, xphyle.paths.TempPath

Context manager that creates a temporary directory and cleans it up upon exit.

Parameters:
  • mode – Access mode to set on temp directory. All subdirectories and files will inherit this mode unless explicity set to be different.
  • path_descriptors – Iterable of TempPathDescriptors.
  • kwargs – Additional arguments passed to tempfile.mkdtemp.

By default all subdirectories and files inherit the mode of the temporary directory. If TempPathDescriptors are specified, the paths are created before permissions are set, enabling creation of a read-only temporary file system.

absolute_path

The absolute path.

close() → None

Delete the temporary directory and all files/subdirectories within.

make_directory(desc: xphyle.paths.TempPathDescriptor = None, apply_permissions: bool = True, **kwargs) → pathlib.Path

Convenience method; calls make_path with path_type=’d’.

make_empty_files(num_files: int, **kwargs) → Sequence[pathlib.Path]

Create randomly-named undefined files.

Parameters:
  • num_files – The number of files to create.
  • kwargs – Arguments to pass to TempPathDescriptor.
Returns:

A sequence of paths.

make_fifo(desc: xphyle.paths.TempPathDescriptor = None, apply_permissions: bool = True, **kwargs) → pathlib.Path

Convenience method; calls make_path with path_type=’|’.

make_file(desc: xphyle.paths.TempPathDescriptor = None, apply_permissions: bool = True, **kwargs) → pathlib.Path

Convenience method; calls make_path with path_type=’f’.

make_path(desc: xphyle.paths.TempPathDescriptor = None, apply_permissions: bool = True, **kwargs) → pathlib.Path

Create a file or directory within the TempDir.

Parameters:
  • desc – A TempPathDescriptor.
  • apply_permissions – Whether permissions should be applied to the new file/directory.
  • kwargs – Arguments to TempPathDescriptor. Ignored unless desc is None.
Returns:

The absolute path to the new file/directory.

make_paths(*path_descriptors) → Sequence[pathlib.Path]

Create multiple files/directories at once. The paths are created before permissions are set, enabling creation of a read-only temporary file system.

Parameters:path_descriptors – One or more TempPathDescriptor.
Returns:A list of the created paths.
relative_path

The relative path.

class xphyle.paths.TempPath(parent: Union[pathlib.Path, TempPath] = None, permissions: Union[xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]], None] = 'rwx', path_type: Union[str, xphyle.types.PathType] = 'd', root: Optional[TempPathManager] = None)

Bases: object

Base class for temporary files/directories.

Parameters:
  • parent – The parent directory.
  • permissions – The access permissions.
  • path_type – ‘f’ = file, ‘d’ = directory.
absolute_path

The absolute path.

exists

Whether the directory exists.

permissions

The permissions of the path. Defaults to the parent’s mode.

relative_path

The relative path.

set_permissions(permissions: Union[xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]], None] = None, set_parent: bool = False, additive: bool = False) → Optional[xphyle.types.PermissionSet]

Set the permissions for the path.

Parameters:
  • permissions – The new flags to set. If None, the existing flags are used.
  • set_parent – Whether to recursively set the permissions of all parents. This is done additively.
  • additive – Whether permissions should be additive (e.g. if permissions == ‘w’ and self.permissions == ‘r’, the new mode is ‘rw’).
Returns:

The PermissionSet representing the flags that were set.

class xphyle.paths.TempPathDescriptor(name: str = None, parent: Union[pathlib.PurePath, xphyle.paths.TempPath, None] = None, permissions: Union[xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]], None] = None, suffix: str = '', prefix: str = '', contents: str = '', path_type: Union[str, xphyle.types.PathType] = 'f', root: Optional[TempPathManager] = None)

Bases: xphyle.paths.TempPath

Describes a temporary file or directory within a TempDir.

Parameters:
  • name – The file/directory name.
  • parent – The parent directory, a TempPathDescriptor.
  • permissions – The permissions mode.
  • prefix (suffix,) – The suffix and prefix to use when calling mkstemp or mkdtemp.
  • path_type – ‘f’ (for file), ‘d’ (for directory), or ‘|’ (for FIFO).
absolute_path

The absolute path.

create(apply_permissions: bool = True) → None

Create the file/directory.

Parameters:apply_permissions – Whether to set permissions according to self.permissions.
relative_path

The relative path.

class xphyle.paths.TempPathManager

Bases: object

Base for classes that manage mapping between paths and TempPathDescriptors.

clear()
xphyle.paths.abspath(path: pathlib.PurePath) → pathlib.PurePath

Returns the fully resolved path associated with path.

Parameters:path – Relative or absolute path
Returns:A PurePath - typically a pathlib.Path, but may be STDOUT or STDERR.

Examples

abspath(‘foo’) # -> /path/to/curdir/foo abspath(‘~/foo’) # -> /home/curuser/foo

xphyle.paths.as_path(path: Union[str, pathlib.PurePath], access: Union[str, xphyle.types.ModeAccess, None] = None) → pathlib.Path

Convert a string to a Path. Note that trying to use STDIN/STDOUT/STDERR as actual paths on Windows will result in an error.

Parameters:
  • path – String to convert. May be a string path, a stdin/stdout/stderr placeholder, or file:// URL. If it is already a Path, it is returned without modification.
  • access – The file access mode, to disambiguate stdin/stdout when path is the placeholder (‘-‘).
Returns:

A Path instance.

Raises:

ValueError if ‘path’ is a stdin/stdout placeholder and ‘access’ is None.

xphyle.paths.as_pure_path(path: Union[str, pathlib.PurePath], access: Union[str, xphyle.types.ModeAccess, None] = None) → pathlib.PurePath

Convert a string to a PurePath.

Parameters:
  • path – String to convert. May be a string path, a stdin/stdout/stderr placeholder, or file:// URL. If it is already a PurePath, it is returned without modification.
  • access – The file access mode, to disambiguate stdin/stdout when path is the placeholder (‘-‘).
Returns:

A PurePath instance. Except with ‘path’ is a PurePath or stdin/stdout/stderr placeholder, the actual return type is a Path instance.

xphyle.paths.check_access(path: pathlib.PurePath, permissions: Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess, xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]]]) → xphyle.types.PermissionSet

Check that path is accessible with the given set of permissions.

Parameters:
  • path – The path to check.
  • permissions – Access specifier (string/int/ModeAccess).
Raises:

IOError if the path cannot be accessed according to permissions.

xphyle.paths.check_path(path: pathlib.PurePath, path_type: Union[str, xphyle.types.PathType] = None, permissions: Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess, xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]]] = None) → pathlib.PurePath

Resolves the path (using resolve_path) and checks that the path is of the specified type and allows the specified access.

Parameters:
  • path – The path to check.
  • path_type – A string or PathType (‘f’ or ‘d’).
  • permissions – Access flag (string, int, Permission, or PermissionSet).
Returns:

The fully resolved path.

Raises:
  • IOError if the path does not exist, is not of the specified type,
  • or doesn’t allow the specified access.
xphyle.paths.check_readable_file(path: pathlib.PurePath) → pathlib.PurePath

Check that path exists and is readable.

Parameters:path – The path to check
Returns:The fully resolved path of path
xphyle.paths.check_std(path: pathlib.PurePath, error: bool = False) → bool

Check whether the path is ‘-‘ (stdout) or ‘_’ (stderr).

Parameters:
  • path – The path to check.
  • error – Whether an error should be raised if path is stdout or stderr.
Returns:

True if path is stdout or stderr.

Raises:

ValueError if path is stdout or stderr and error is True.

xphyle.paths.check_writable_file(path: pathlib.PurePath, mkdirs: bool = True) → pathlib.PurePath

If path exists, check that it is writable, otherwise check that its parent directory exists and is writable.

Parameters:
  • path – The path to check.
  • mkdirs – Whether to create any missing directories (True).
Returns:

The fully resolved path.

xphyle.paths.convert_std_placeholder(path: str, access: Union[str, xphyle.types.FileMode, xphyle.types.ModeAccess, None] = None) → Union[str, pathlib.PurePath]
xphyle.paths.deprecated(msg: str)

Issue a deprecation warning:

Parameters:msg – The warning message to display.
xphyle.paths.deprecated_str_to_path(*args_to_convert, list_args: Optional[Sequence[Union[int, str]]] = None, dict_args: Optional[Sequence[Union[int, str]]] = None) → Callable

Decorator for a function that used to take paths as strings and now only takes them as os.PurePath objects. A deprecation warning is issued, and the string arguments are converted to paths before calling the function.

Backward compatibility can be disabled by the XPHYLE_BACKCOMPAT environment variable. If set to false (0), the func is returned immediately.

xphyle.paths.filename(path: pathlib.PurePath) → str

Equivalent to split_path(path)[1].

Parameters:path (The) –
Returns:The filename part of path (without any extensions).
xphyle.paths.find(root: pathlib.PurePath, pattern: Union[str, Pattern[~AnyStr]], path_types: Sequence[Union[str, xphyle.types.PathType]] = 'f', recursive: bool = True, return_matches: bool = False) → Union[Sequence[pathlib.PurePath], Sequence[Tuple[pathlib.PurePath, Match[~AnyStr]]]]

Find all paths under root that match pattern.

Parameters:
  • root – Directory at which to start search.
  • pattern – File name pattern to match (string or re object).
  • path_types – Types to return – files (‘f’), directories (‘d’ or both (‘fd’).
  • recursive – Whether to search directories recursively.
  • return_matches – Whether to return regular expression match for each file.
Returns:

List of matching paths. If return_matches is True, each item will be a (path, Match) tuple.

xphyle.paths.get_permissions(path: pathlib.PurePath) → xphyle.types.PermissionSet

Get the permissions of a file/directory.

Parameters:path – Path of file/directory.
Returns:An PermissionSet.
Raises:IOError if the file/directory doesn’t exist.
xphyle.paths.get_root(path: Optional[pathlib.PurePath] = None) → str

Get the root directory.

Parameters:path – A path, or ‘.’ to get the root of the working directory, or None to get the root of the path to the script. Stdout and stderr are not valid arguments.
Returns:A string path to the root directory.
xphyle.paths.match_to_dict(match: Match[~AnyStr], path_vars: Dict[str, xphyle.paths.PathVar], errors: bool = True) → Optional[Dict[str, Any]]

Convert a regular expression Match to a dict of (name, value) for all PathVars.

Parameters:
  • match – A re.Match.
  • path_vars – A dict of PathVars.
  • errors – If True, raise an exception on validation error, otherwise return None.
Returns:

A (name, value) dict.

Raises:

ValueError if any values fail validation.

xphyle.paths.path_inst(path: Union[str, pathlib.PurePath], values: dict = None) → xphyle.paths.PathInst

Create a PathInst from a path and values dict.

Parameters:
  • path – The path.
  • values – The values dict.
Returns:

A PathInst.

xphyle.paths.resolve_path(path: pathlib.PurePath, parent: pathlib.PurePath = None) → pathlib.PurePath

Resolves the absolute path of the specified file and ensures that the file/directory exists.

Parameters:
  • path – Path to resolve.
  • parent – The directory containing path if path is relative.
Returns:

The absolute path.

Raises:

IOError – if the path does not exist or is invalid.

xphyle.paths.safe_check_path(path: pathlib.PurePath, *args, **kwargs) → Optional[pathlib.PurePath]

Safe vesion of check_path. Returns None rather than throw an exception.

xphyle.paths.safe_check_readable_file(path: pathlib.PurePath) → Optional[pathlib.PurePath]

Safe vesion of check_readable_file. Returns None rather than throw an exception.

xphyle.paths.safe_check_writable_file(path: pathlib.PurePath) → Optional[pathlib.PurePath]

Safe vesion of check_writable_file. Returns None rather than throw an exception.

xphyle.paths.set_permissions(path: pathlib.PurePath, permissions: Union[xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]]]) → xphyle.types.PermissionSet

Sets file stat flags (using chmod).

Parameters:
  • path – The file to chmod.
  • permissions – Stat flags (any of ‘r’, ‘w’, ‘x’, or an PermissionSet).
Returns:

An PermissionSet.

xphyle.paths.split_path(path: pathlib.PurePath, keep_seps: bool = True, resolve: bool = True) → Tuple[str, ...]

Splits a path into a (parent_dir, name, *ext) tuple.

Parameters:
  • path – The path. Stdout and stderr are not valid arguments.
  • keep_seps – Whether the extension separators should be kept as part of the file extensions
  • resolve – Whether to resolve the path before splitting
Returns:

A tuple of length >= 2, in which the first element is the parent directory, the second element is the file name, and the remaining elements are file extensions.

Examples

split_path(‘myfile.foo.txt’, False) # -> (‘/current/dir’, ‘myfile’, ‘foo’, ‘txt’) split_path(‘/usr/local/foobar.gz’, True) # -> (‘/usr/local’, ‘foobar’, ‘.gz’)

Plugin API

You shouldn’t need these modules unless you want to extend xphyle functionality.

xphyle.formats module

Interfaces to compression file formats. Magic numbers from: https://en.wikipedia.org/wiki/List_of_file_signatures

class xphyle.formats.BGzip

Bases: xphyle.formats.DualExeCompressionFormat

bgzip is block gzip. bgzip files are compatible with gzip. Typically, this format is only used when specifically requested, or when a bgzip file specifically has a .bgz (rather than .gz) extension.

The bgzip program is only used for compression; gzip is used for decompression because bgzip does not support decompressing a file with a non-.gz extension.

aliases

All of the aliases by which this format is known.

allowed_exts

Extensions that are allowed to be used. Defaults to self.exts.

compress_commands
compresslevel_range

The range of valid compression levels – (lowest, highest).

decompress_commands
default_compresslevel

The default compression level, if compression is supported and is user-configurable, otherwise None.

exts

The commonly used file extensions.

get_compress_command(src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: int = None) → List[str]

Build the compress command for the system executable.

Parameters:
  • src – The source file path, or STDIN if input should be read from stdin
  • stdout – Whether output should go to stdout
  • compresslevel – Integer compression level; typically 1-9
Returns:

List of command arguments

get_decompress_command(src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True) → List[str]

Build the decompress command for the system executable.

Parameters:
  • src – The source file path, or STDIN if input should be read from stdin
  • stdout – Whether output should go to stdout
Returns:

List of command arguments

magic_bytes

The initial bytes that indicate the file type.

mime_types

The MIME types.

module_name
name

The canonical format name.

open_file_python(path_or_file: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode], **kwargs) → Union[IO, io.IOBase]

Open a file using the python library.

Parameters:
  • path_or_file – The file to open – a path or open file object.
  • mode – The file open mode.
  • kwargs – Additional arguments to pass to the open method.
Returns:

A file-like object.

class xphyle.formats.BZip2

Bases: xphyle.formats.SingleExeCompressionFormat

Implementation of CompressionFormat for bzip2 files.

compresslevel_range

The range of valid compression levels – (lowest, highest).

default_compresslevel

The default compression level, if compression is supported and is user-configurable, otherwise None.

exts

The commonly used file extensions.

get_command(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: Optional[int] = 6) → List[str]

Build the command for the system executable.

Parameters:
  • operation – ‘c’ = compress, ‘d’ = decompress
  • src – The source file path, or STDIN if input should be read from stdin
  • stdout – Whether output should go to stdout
  • compresslevel – Integer compression level; typically 1-9
Returns:

List of command arguments

magic_bytes

The initial bytes that indicate the file type.

mime_types

The MIME types.

name

The canonical format name.

open_file_python(path_or_file: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode], **kwargs) → Union[IO, io.IOBase]

Open a file using the python library.

Parameters:
  • path_or_file – The file to open – a path or open file object.
  • mode – The file open mode.
  • kwargs – Additional arguments to pass to the open method.
Returns:

A file-like object.

system_commands

The names of the system-level commands, in order of preference.

class xphyle.formats.CompressionFormat

Bases: xphyle.formats.FileFormat

Base class for classes that provide access to system-level and python-level implementations of compression formats.

aliases

All of the aliases by which this format is known.

allowed_exts

Extensions that are allowed to be used. Defaults to self.exts.

can_use_system_compression

Whether at least one command in self.system_commands resolves to an existing, executable file.

can_use_system_decompression

Whether at least one command in self.system_commands resolves to an existing, executable file.

compress(raw_bytes: bytes, **kwargs) → bytes

Compress bytes.

Parameters:
  • raw_bytes – The bytes to compress
  • kwargs – Additional arguments to compression function.
Returns:

The compressed bytes

compress_file(source: Union[os.PathLike, IO, io.IOBase], dest: Union[os.PathLike, IO, io.IOBase] = None, keep: bool = True, compresslevel: int = None, use_system: bool = True, **kwargs) → pathlib.PurePath

Compress data from one file and write to another.

Parameters:
  • source – Source file, either a path or an open file-like object.
  • dest – Destination file, either a path or an open file-like object. If None, the file name is determined from source.
  • keep – Whether to keep the source file.
  • compresslevel – Compression level.
  • use_system – Whether to try to use system-level compression.
  • kwargs – Additional arguments to pass to the open method when opening the destination file.
Returns:

Path to the destination file.

Raises:

IOError if there is an error compressing the file.

compress_iterable(strings: Iterable[str], delimiter: bytes = b'', encoding: str = 'utf-8', **kwargs) → bytes

Compress an iterable of strings using the python-level interface.

Parameters:
  • strings – An iterable of strings
  • delimiter – The delimiter (byte string) to use to separate strings
  • encoding – The byte encoding (utf-8)
  • kwargs – Additional arguments to compression function
Returns:

The compressed text, as bytes

compress_name

The name of the compression program.

compress_path

The path of the compression program.

compress_string(text: str, encoding: str = 'utf-8', **kwargs) → bytes

Compress a string.

Parameters:
  • text – The text to compress
  • encoding – The byte encoding (utf-8)
  • kwargs – Additional arguments to compression function
Returns:

The compressed text, as bytes

compresslevel_range

The range of valid compression levels – (lowest, highest).

decompress(compressed_bytes, **kwargs) → bytes

Decompress bytes.

Parameters:
  • compressed_bytes – The compressed data
  • kwargs – Additional arguments to the decompression function
Returns:

The decompressed bytes

decompress_file(source: Union[os.PathLike, IO, io.IOBase], dest: Union[os.PathLike, IO, io.IOBase, None] = None, keep: bool = True, use_system: bool = True, **kwargs) → pathlib.PurePath

Decompress data from one file and write to another.

Parameters:
  • source – Source file, either a path or an open file-like object.
  • dest – Destination file, either a path or an open file-like object. If None, the file name is determined from source.
  • keep – Whether to keep the source file.
  • use_system – Whether to try to use system-level compression.
  • kwargs – Additional arguments to passs to the open method when opening the compressed file.
Returns:

Path to the destination file.

Raises:

IOError if there is an error decompressing the file.

decompress_name

The name of the decompression program.

decompress_path

The path of the decompression program.

decompress_string(compressed_bytes: bytes, encoding: str = 'utf-8', **kwargs) → str

Decompress bytes and return as a string.

Parameters:
  • compressed_bytes – The compressed data
  • encoding – The byte encoding to use
  • kwargs – Additional arguments to the decompression function
Returns:

The decompressed data as a string

default_compresslevel

The default compression level, if compression is supported and is user-configurable, otherwise None.

default_ext

The default file extension for this format.

exts

The commonly used file extensions.

get_command(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: int = None) → List[str]

Build the command for the system executable.

Parameters:
  • operation – ‘c’ = compress, ‘d’ = decompress
  • src – The source file path, or STDIN if input should be read from stdin
  • stdout – Whether output should go to stdout
  • compresslevel – Integer compression level; typically 1-9
Returns:

List of command arguments

get_list_command(path: pathlib.PurePath) → Optional[List[str]]

Get the command to list contents of a compressed file.

Parameters:path – Path to the compressed file.
Returns:List of command arguments, or None if the uncompressed size cannot be determined (without actually decompressing the file).
handle_command_return(returncode: int, cmd: List[str], stderr: bytes = None) → None

Handle the returned values from executing a system-level command.

Parameters:
  • returncode – The returncode from the command (typically, anything other than 0 is an error).
  • cmd – The command that generated the return value.
  • stderr – The standard error from the command.
Raises:

IOError if the command output represents an error.

magic_bytes

The initial bytes that indicate the file type.

mime_types

The MIME types.

name

The canonical format name.

open_file(path: pathlib.PurePath, mode: Union[str, xphyle.types.FileMode], use_system: bool = True, **kwargs) → Union[IO, io.IOBase]

Opens a compressed file for reading or writing.

If use_system is True and the system provides an accessible executable, then system-level compression is used. Otherwise defaults to using the python implementation.

Parameters:
  • path – The path of the file to open.
  • mode – The file open mode.
  • use_system – Whether to attempt to use system-level compression.
  • kwargs – Additional arguments to pass to the python-level open method, if system-level compression isn’t used.
Returns:

A file-like object.

open_file_python(path_or_file: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode], **kwargs) → Union[IO, io.IOBase]

Open a file using the python library.

Parameters:
  • path_or_file – The file to open – a path or open file object.
  • mode – The file open mode.
  • kwargs – Additional arguments to pass to the open method.
Returns:

A file-like object.

parse_file_listing(listing: str) → Tuple[int, int, float]

Parse the result of the list command.

Parameters:listing – The output of executing the list command.
Returns:A tuple (<compressed size in bytes>, <uncompressed size in bytes>, <compression ratio>).
system_commands

The names of the system-level commands, in order of preference.

uncompressed_size(path: pathlib.PurePath) → Optional[int]

Get the uncompressed size of a compressed file.

Parameters:path – Path to the compressed file.
Returns:The uncompressed size of the file in bytes, or None if the uncompressed size cannot be determined (without actually decompressing the file).
class xphyle.formats.DualExeCompressionFormat

Bases: xphyle.formats.CompressionFormat

CompressionFormat that uses different executables for compressing and decompressing.

compress_commands
compress_lib

Caches and returns the python module for compressing this file format.

Returns:The module
Raises:ImportError if the module cannot be imported.
compress_name

The name of the compression program.

compress_path

The path of the compression program.

decompress_commands
decompress_lib

Caches and returns the python module for decompressing this file format.

Returns:The module
Raises:ImportError if the module cannot be imported.
decompress_name

The name of the decompression program.

decompress_path

The path of the decompression program.

get_command(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: Optional[int] = None) → List[str]

Build the command for the system executable.

Parameters:
  • operation – ‘c’ = compress, ‘d’ = decompress
  • src – The source file path, or STDIN if input should be read from stdin
  • stdout – Whether output should go to stdout
  • compresslevel – Integer compression level; typically 1-9
Returns:

List of command arguments

get_compress_command(src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: int = None) → List[str]

Build the compress command for the system executable.

Parameters:
  • src – The source file path, or STDIN if input should be read from stdin
  • stdout – Whether output should go to stdout
  • compresslevel – Integer compression level; typically 1-9
Returns:

List of command arguments

get_decompress_command(src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True) → List[str]

Build the decompress command for the system executable.

Parameters:
  • src – The source file path, or STDIN if input should be read from stdin
  • stdout – Whether output should go to stdout
Returns:

List of command arguments

system_commands

The names of the system-level commands, in order of preference.

class xphyle.formats.FileFormat

Bases: abc.ABC

Base class for classes that wrap built-in python file format libraries. The subclass must provide the name member.

lib

Caches and returns the python module assocated with this file format.

Returns:The module
Raises:ImportError if the module cannot be imported.
module_name
name
class xphyle.formats.Formats

Bases: object

Manages a set of compression formats.

compression_format_aliases = None

Dict mapping aliases to compression format names.

compression_formats = None

Dict of registered compression formats

get_compression_format(name: str) → xphyle.formats.CompressionFormat

Returns the CompressionFormat associated with the given name.

Raises:ValueError if that format is not supported.
get_compression_format_name(alias: str)

Returns the cannonical name for the given alias.

get_format_for_mime_type(mime_type: str) → str

Returns the file format associated with a MIME type, or None if no format is associated with the mime type.

guess_compression_format(name: Union[str, pathlib.PurePath]) → Optional[str]

Guess the compression format by name or file extension.

Returns:The format name, or None if it could not be guessed.
guess_format_from_buffer(buffer: _io.BufferedReader) → Optional[str]

Guess file format from a byte buffer that provides a peek method.

Parameters:buffer – The buffer object
Returns:The format name, or None if it could not be guessed.
guess_format_from_file_header(path: pathlib.PurePath) → Optional[str]

Guess file format from ‘magic bytes’ at the beginning of the file.

Note that path must be openable and readable. If it is a named pipe or other pseudo-file type, the magic bytes will be destructively consumed and thus will open correctly.

Parameters:path – Path to the file
Returns:The format name, or None if it could not be guessed.
guess_format_from_header_bytes(header_bytes: bytes) → Optional[str]

Guess file format from a sequence of bytes from a file header.

Parameters:header_bytes – The bytes
Returns:The format name, or None if it could not be guessed.
has_compatible_extension(dest_fmt, ext_fmt) → bool

Checks that dest_fmt is allowed to use a file extension supported by ext_fmt. This is mostly to handle the special case where dest_fmt and ext_fmt allow the same extension and the actual format cannot be detected from the file header.

Returns:True if an allowed extension of dest_fmt is supported by ext_fmt else False.
list_compression_formats()

Returns a list of all registered compression formats.

list_extensions(with_sep: bool = False) → Iterable[str]

Returns an iterable with all valid extensions.

Parameters:with_sep – Add separator prefix to each extension.
magic_bytes = None

Dict mapping the first byte in a ‘magic’ sequence to a tuple of (format, rest_of_sequence)

max_magic_bytes = None

Maximum number of bytes in a registered magic byte sequence

mime_types = None

Dict mapping MIME types to file formats

register_compression_format(format_class: Callable[xphyle.formats.CompressionFormat]) → None

Register a new compression format.

Parameters:format_class – a subclass of CompressionFormat
class xphyle.formats.Gzip

Bases: xphyle.formats.SingleExeCompressionFormat

Implementation of CompressionFormat for gzip files.

compresslevel_range

The compression level; pigz allows 0-11 (har har) while gzip allows 0-9.

default_compresslevel

The default compression level, if compression is supported and is user-configurable, otherwise None.

exts

The commonly used file extensions.

get_command(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: int = None) → List[str]

Build the command for the system executable.

Parameters:
  • operation – ‘c’ = compress, ‘d’ = decompress
  • src – The source file path, or STDIN if input should be read from stdin
  • stdout – Whether output should go to stdout
  • compresslevel – Integer compression level; typically 1-9
Returns:

List of command arguments

get_list_command(path: pathlib.PurePath) → List[str]

Get the command to list contents of a compressed file.

Parameters:path – Path to the compressed file.
Returns:List of command arguments, or None if the uncompressed size cannot be determined (without actually decompressing the file).
handle_command_return(returncode: int, cmd: List[str], stderr: bytes = None) → None

Handle the returned values from executing a system-level command.

Parameters:
  • returncode – The returncode from the command (typically, anything other than 0 is an error).
  • cmd – The command that generated the return value.
  • stderr – The standard error from the command.
Raises:

IOError if the command output represents an error.

magic_bytes

The initial bytes that indicate the file type.

mime_types

The MIME types.

name

The canonical format name.

open_file_python(path_or_file: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode], **kwargs) → Union[IO, io.IOBase]

Open a file using the python library.

Parameters:
  • path_or_file – The file to open – a path or open file object.
  • mode – The file open mode.
  • kwargs – Additional arguments to pass to the open method.
Returns:

A file-like object.

parse_file_listing(listing: str) → Tuple[int, int, float]

Parse the result of the list command.

Parameters:listing – The output of executing the list command.
Returns:A tuple (<compressed size in bytes>, <uncompressed size in bytes>, <compression ratio>).
system_commands

The names of the system-level commands, in order of preference.

class xphyle.formats.Lzma

Bases: xphyle.formats.SingleExeCompressionFormat

Implementation of CompressionFormat for lzma (.xz) files.

compress(raw_bytes: bytes, **kwargs) → bytes

Compress bytes.

Parameters:
  • raw_bytes – The bytes to compress
  • kwargs – Additional arguments to compression function.
Returns:

The compressed bytes

compresslevel_range

The range of valid compression levels – (lowest, highest).

default_compresslevel

The default compression level, if compression is supported and is user-configurable, otherwise None.

exts

The commonly used file extensions.

get_command(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: Optional[int] = 6) → List[str]

Build the command for the system executable.

Parameters:
  • operation – ‘c’ = compress, ‘d’ = decompress
  • src – The source file path, or STDIN if input should be read from stdin
  • stdout – Whether output should go to stdout
  • compresslevel – Integer compression level; typically 1-9
Returns:

List of command arguments

get_list_command(path: pathlib.PurePath) → List[str]

Get the command to list contents of a compressed file.

Parameters:path – Path to the compressed file.
Returns:List of command arguments, or None if the uncompressed size cannot be determined (without actually decompressing the file).
magic_bytes

The initial bytes that indicate the file type.

mime_types

The MIME types.

name

The canonical format name.

parse_file_listing(listing: str) → Tuple[int, int, float]

Parse the result of the list command.

Parameters:listing – The output of executing the list command.
Returns:A tuple (<compressed size in bytes>, <uncompressed size in bytes>, <compression ratio>).
system_commands

The names of the system-level commands, in order of preference.

class xphyle.formats.SingleExeCompressionFormat

Bases: xphyle.formats.CompressionFormat

Base class form ``CompressionFormat``s that use the same executable for compressing and decompressing.

compress_name

The name of the compression program.

compress_path

The path of the compression program.

decompress_name

The name of the decompression program.

decompress_path

The path of the decompression program.

executable_name

The name of the system executable.

executable_path

The path of the system executable.

class xphyle.formats.SystemIO(path: pathlib.PurePath)

Bases: xphyle.types.FileLikeBase

Base class for SystemReader and SystemWriter.

Parameters:path – The file path.
closed
name
class xphyle.formats.SystemReader(executable_path: pathlib.PurePath, path: pathlib.PurePath, command: List[str], executable_name: str = None)

Bases: xphyle.formats.SystemIO

Read from a compressed file using a system-level compression program.

Parameters:
  • executable_path – The fully resolved path the the system executable
  • path – The compressed file to read
  • command – List of command arguments.
  • executable_name – The display name of the executable, or None to use the basename of executable_path
close() → None

Close the reader; terminates the underlying process.

flush() → None

Implementing file interface; no-op.

mode
read(*args) → bytes

Read bytes from the stream. Arguments are passed through to the subprocess read method.

readable() → bool

Implementing file interface; returns True.

class xphyle.formats.SystemWriter(executable_path: pathlib.PurePath, path: pathlib.PurePath, mode: Union[str, xphyle.types.FileMode] = 'w', command: List[str] = None, executable_name: str = None)

Bases: xphyle.formats.SystemIO

Write to a compressed file using a system-level compression program.

Parameters:
  • executable_path – The fully resolved path the the system executable.
  • path – The compressed file to read.
  • mode – The write mode (w/a/x).
  • command – Format string with two variables – exe (the path to the system executable), and path.
  • executable_name – The display name of the executable, or None to use the basename of executable_path.
close() → None

Close the writer; terminates the underlying process.

flush() → None

Flush stdin of the underlying process.

mode
writable() → bool

Implementing file interface; returns True.

write(arg) → int

Write to stdin of the underlying process.

xphyle.formats.THREADS = <xphyle.formats.ThreadsVar object>

Number of concurrent threads that can be used by formats that support parallelization.

class xphyle.formats.ThreadsVar(default_value: int = 1)

Bases: object

Maintain threads variable.

update(threads: Optional[int] = True) → None

Update the number of threads to use.

Parameters:threads – True = use all available cores; False or an int <= 1 means single-threaded; None means reset to the default value; otherwise an integer number of threads.
class xphyle.formats.Zstd

Bases: xphyle.formats.SingleExeCompressionFormat

Implementation of CompressionFormat for zstd (.zst) files.

compresslevel_range

The range of valid compression levels – (lowest, highest).

default_compresslevel

The default compression level, if compression is supported and is user-configurable, otherwise None.

exts

The commonly used file extensions.

get_command(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: int = None) → List[str]

Build the command for the system executable.

Parameters:
  • operation – ‘c’ = compress, ‘d’ = decompress
  • src – The source file path, or STDIN if input should be read from stdin
  • stdout – Whether output should go to stdout
  • compresslevel – Integer compression level; typically 1-9
Returns:

List of command arguments

get_list_command(path: pathlib.PurePath) → List[str]

Get the command to list contents of a compressed file.

Parameters:path – Path to the compressed file.
Returns:List of command arguments, or None if the uncompressed size cannot be determined (without actually decompressing the file).
magic_bytes

The initial bytes that indicate the file type.

mime_types

The MIME types.

module_name
name

The canonical format name.

open_file_python(path_or_file: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode], **kwargs) → Union[IO, io.IOBase]

Open a file using the python library.

Parameters:
  • path_or_file – The file to open – a path or open file object.
  • mode – The file open mode.
  • kwargs – Additional arguments to pass to the open method.
Returns:

A file-like object.

parse_file_listing(listing: str) → Tuple[int, int, float]

Parse the result of the list command.

Parameters:listing – The output of executing the list command.
Returns:A tuple (<compressed size in bytes>, <uncompressed size in bytes>, <compression ratio>).
xphyle.formats.compression_format(cls)

Required decorator on concrete CompressionFormat subclasses. Registers the CompressionFormat in FORMATS.

xphyle.progress module

Common interface to enable operations to be wrapped in a progress bar. By default, pokrok is used for python-level operations and pv for system-level operations.

class xphyle.progress.IterableProgress(default_wrapper: Callable = <function progress_iter>)

Bases: object

Manages the python-level wrapper.

Parameters:default_wrapper – Callable (typically a class) that returns a Callable with the signature of wrap.
update(enable: Optional[bool] = None, wrapper: Optional[Callable[..., Iterable]] = None) → None

Enable the python progress bar and/or set a new wrapper.

Parameters:
  • enable – Whether to enable use of a progress wrapper.
  • wrapper – A callable that takes three arguments, itr, desc, size, and returns an iterable.
wrap(itr: Iterable, desc: Optional[str] = None, size: Optional[int] = None) → Iterable

Wrap an iterable in a progress bar.

Parameters:
  • itr – The Iterable to wrap.
  • desc – Optional description.
  • size – Optional max value of the progress bar.
Returns:

The wrapped Iterable.

class xphyle.progress.ProcessProgress(default_wrapper: Callable = <function pv_command>)

Bases: object

Manage the system-level progress wrapper.

Parameters:default_wrapper – Callable that returns the argument list for the default wrapper command.
update(enable: Optional[bool] = None, wrapper: Union[str, Sequence[str], None] = None) → None

Enable the python system progress bar and/or set the wrapper command.

Parameters:
  • enable – Whether to enable use of a progress wrapper.
  • wrapper – A command string or sequence of command arguments.
wrap(cmd: Sequence[str], stdin: Union[IO, io.IOBase], stdout: Union[IO, io.IOBase], **kwargs) → subprocess.Popen

Pipe a system command through a progress bar program.

For the process to be wrapped, one of stdin, stdout must not be None.

Parameters:
  • cmd – Command arguments.
  • stdin – File-like object to read into the process stdin, or None to use PIPE.
  • stdout – File-like object to write from the process stdout, or None to use PIPE.
  • kwargs – Additional arguments to pass to Popen.
Returns:

Open process.

xphyle.progress.iter_file_chunked(fileobj: Union[IO, io.IOBase], chunksize: int = 1024) → Iterable

Returns a progress bar-wrapped iterator over a file that reads fixed-size chunks.

Parameters:
  • fileobj – A file-like object.
  • chunksize – The maximum size in bytes of each chunk.
Returns:

An iterable over the chunks of the file.

xphyle.progress.pv_command()

Default system wrapper command.

xphyle.progress.system_progress_command()

Resolve a system-level progress bar command.

Parameters:
  • exe – The executable name or absolute path.
  • args – A list of additional command line arguments.
  • require – Whether to raise an exception if the command does not exist.
Returns:

A tuple of (executable_path, *args).

xphyle.urls module

Methods for handling URLs.

xphyle.urls.get_url_file_name(response: Any, parsed_url: Optional[urllib.parse.ParseResult] = None) → Optional[str]

If a response object has HTTP-like headers, extract the filename from the Content-Disposition header.

Parameters:
  • response – A response object returned by open_url.
  • parsed_url – The result of calling parse_url.
Returns:

The file name, or None if it could not be determined.

xphyle.urls.get_url_mime_type(response: Any) → Optional[str]

If a response object has HTTP-like headers, extract the MIME type from the Content-Type header.

Parameters:response – A response object returned by open_url.
Returns:The content type, or None if the response lacks a ‘Content-Type’ header.
xphyle.urls.open_url(url_string: str, byte_range: Optional[Tuple[int, int]] = None, headers: Optional[dict] = None, **kwargs) → Any

Open a URL for reading.

Parameters:
  • url_string – A valid url string.
  • byte_range – Range of bytes to read (start, stop).
  • headers – dict of request headers.
  • kwargs – Additional arguments to pass to urlopen.
Returns:

A response object, or None if the URL is not valid or cannot be opened.

Notes

The return value of urlopen is only guaranteed to have certain methods, not to be of any specific type, thus the Any return type. Furthermore, the response may be wrapped in an io.BufferedReader to ensure that a peek method is available.

xphyle.urls.parse_url(url_string: str) → Optional[urllib.parse.ParseResult]

Attempts to parse a URL.

Parameters:url_string – String to test.
Returns:A 6-tuple, as described in urlparse, or None if the URL cannot be parsed, or if it lacks a minimum set of attributes. Note that a URL may be valid and still not be openable (for example, if the scheme is recognized by urlopen).