xphyle package¶
Public API¶
xphyle module¶
The main xphyle methods – xopen, popen, and open_.
-
class
xphyle.
BufferWrapper
(fileobj: Union[os.PathLike, IO, io.IOBase], buffer: Union[_io.StringIO, _io.BytesIO], compression: Union[bool, str] = False, name: str = None, **kwargs)¶ Bases:
xphyle.FileWrapper
Wrapper around a string/bytes buffer.
Parameters: - fileobj – The fileobj to wrap (the raw or wrapped buffer).
- buffer – The raw buffer.
- compression – Compression type.
- close_fileobj – Whether to close the buffer when closing this wrapper.
-
getvalue
() → Union[bytes, str]¶ Returns the contents of the buffer.
-
class
xphyle.
EventListener
(**kwargs)¶ Bases:
typing.Generic
Base class for listener events that can be registered on a FileLikeWrapper.
Parameters: kwargs – keyword arguments to pass through to execute
-
execute
(wrapper: E, **kwargs) → None¶ Handle an event. This method must be implemented by subclasses.
Parameters: - wrapper – The
EventManager
on which this event was registered. - kwargs – A union of the keyword arguments passed to the constructor and the __call__ method.
- wrapper – The
-
-
class
xphyle.
EventManager
¶ Bases:
object
Mixin type for classes that allow registering event listners.
-
register_listener
(event: Union[str, xphyle.types.EventType], listener: xphyle.EventListener) → None¶ Register an event listener.
Parameters: - event – Event name (currently, only ‘close’ is recognized)
- listener – A listener object, which must be callable with a single argument – this file wrapper.
-
-
class
xphyle.
FileLikeWrapper
(fileobj: Union[IO, io.IOBase], compression: Union[bool, str] = False, close_fileobj: bool = True)¶ Bases:
xphyle.EventManager
,xphyle.types.FileLikeBase
Base class for wrappers around file-like objects. By default, method calls are forwarded to the file object. Adds the following:
1. A simple event system by which registered listeners can respond to file events. Currently, ‘close’ is the only supported event 2. Wraps file iterators in a progress bar (if configured)
Parameters: - fileobj – The file-like object to wrap.
- compression – Whether the wrapped file is compressed.
- close_fileobj – Whether to close the wrapped file object when closing this wrapper.
-
close
() → None¶ Close the file, close an open iterator, and fire ‘close’ events to any listeners.
-
closed
¶
-
fileno
() → int¶
-
flush
() → None¶
-
isatty
() → bool¶
-
mode
¶
-
name
¶
-
peek
(size: int = 1) → Union[bytes, str]¶ Return bytes/characters from the stream without advancing the position. At most one single read on the raw stream is done to satisfy the call.
Parameters: size – The max number of bytes/characters to return. Returns: At most size bytes/characters. Unlike io.BufferedReader.peek(), will never return more than size bytes/characters. Notes
If the file uses multi-byte encoding and N characters are desired, it is up to the caller to request size=2N.
-
read
(size: int = -1) → bytes¶
-
readable
() → bool¶
-
readline
(size: int = -1) → Union[bytes, str]¶
-
readlines
(hint: int = -1) → List[Union[bytes, str]]¶
-
seek
(offset, whence: int = 0) → int¶
-
seekable
() → bool¶
-
tell
() → int¶
-
truncate
(size: int = None) → int¶
-
writable
() → bool¶
-
write
(string: Union[bytes, str]) → int¶
-
writelines
(lines: Iterable[Union[bytes, str]]) → None¶
-
class
xphyle.
FileWrapper
(source: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode] = 'w', compression: Union[bool, str] = False, name: Union[str, pathlib.PurePath] = None, close_fileobj: bool = True, **kwargs)¶ Bases:
xphyle.FileLikeWrapper
Wrapper around a file object.
Parameters: - source – Path or file object.
- mode – File open mode.
- compression – Compression type.
- name – Use an alternative name for the file.
- kwargs – Additional arguments to pass to xopen.
-
name
¶
-
path
¶ The source path.
-
class
xphyle.
Process
(args, stdin: Union[os.PathLike, IO, io.IOBase, int] = None, stdout: Union[os.PathLike, IO, io.IOBase, int] = None, stderr: Union[os.PathLike, IO, io.IOBase, int] = None, **kwargs)¶ Bases:
subprocess.Popen
,xphyle.EventManager
,xphyle.types.FileLikeBase
,typing.Iterable
Subclass of
subprocess.Popen
with the following additions:- Provides :method:`Process.wrap_pipes` for wrapping stdin/stdout/stderr
(e.g. to send compressed data to a process’ stdin or read compressed data from its stdout/stderr). * Provides :method:`Process.close` for properly closing stdin/stdout/stderr streams and terminating the process. * Implements required methods to make objects ‘file-like’.
Parameters: - args – Positional arguments, passed to
subprocess.Popen
constructor. - stdout, stderr (stdin,) – Identical to the same arguments to
subprocess.Popen
. - kwargs – Keyword arguments, passed to
subprocess.Popen
constructor.
-
check_valid_returncode
(valid: Container[int] = (0, None, <Signals.SIGPIPE: 13>, 141))¶ Check that the returncodes does not have a value associated with an error state.
Raises: - IOError if :attribute:`returncode` is associated with an error
state.
-
close
() → None¶
-
close1
(timeout: float = None, raise_on_error: bool = False, record_output: bool = False, terminate: bool = False) → Optional[int]¶ Close stdin/stdout/stderr streams, wait for process to finish, and return the process return code.
Parameters: - timeout – time in seconds to wait for stream to close; negative value or None waits indefinitely.
- raise_on_error – Whether to raise an exception if the process returns an error.
- record_output – Whether to store contents of stdout and stderr in place of the actual streams after closing them.
- terminate – If True and timeout is a positive integer, the process is terminated if it doesn’t finish within timeout seconds.
Notes
If :attribute:`record_output` is True, and if stdout/stderr is a PIPE, any contents are read and stored as the value of :attribute:`stdout`/:attribute:`stderr`. Otherwise the data is lost.
Returns: The process returncode. Raises: IOError if `raise_on_error` is True and the process returns an – error code.
-
closed
¶ Whether the Process has been closed.
-
communicate
(inp: Union[bytes, str] = None, timeout: float = None) → Tuple[IO, IO]¶ Send input to stdin, wait for process to terminate, return results.
Parameters: - inp – Input to send to stdin.
- timeout – Time to wait for process to finish.
Returns: Tuple of (stdout, stderr).
-
flush
() → None¶ Flushes stdin if there is one.
-
get_reader
(which: str = None) → Union[IO, io.IOBase]¶ Returns the stream for reading data from stdout/stderr.
Parameters: which – Which stream to read from, ‘stdout’ or ‘stderr’. If None, stdout is used if it exists, otherwise stderr. Returns: The specified stream, or None if the stream doesn’t exist.
-
get_readers
()¶ Returns (stdout, stderr) tuple.
-
get_writer
() → Union[IO, io.IOBase]¶ Returns the stream for writing to stdin.
-
is_wrapped
(name: str) → bool¶ Returns True if the stream corresponding to name is wrapped.
Parameters: name – One of ‘stdin’, ‘stdout’, ‘stderr’
-
mode
¶
-
name
¶
-
read
(size: int = -1, which: str = None) → bytes¶ Read size bytes/characters from stdout or stderr.
Parameters: - size – Number of bytes/characters to read.
- which – Which stream to read from, ‘stdout’ or ‘stderr’. If None, stdout is used if it exists, otherwise stderr.
Returns: The bytes/characters read from the specified stream.
-
readable
() → bool¶ Returns True if this Popen has stdout and/or stderr, otherwise False.
-
readline
(hint: int = -1, which: str = None) → Union[bytes, str]¶
-
readlines
(sizehint: int = -1, which: str = None) → List[Union[bytes, str]]¶
-
wrap_pipes
(**kwargs) → None¶ Wrap stdin/stdout/stderr PIPE streams using xopen.
Parameters: kwargs – for each of ‘stdin’, ‘stdout’, ‘stderr’, a dict providing arguments to xopen describing how the stream should be wrapped.
-
writable
() → bool¶ Returns True if this Popen has stdin, otherwise False.
-
write
(data: Union[bytes, str]) → int¶ Write data to stdin.
Parameters: data – The data to write; must be bytes if stdin is a byte stream or string if stdin is a text stream. Returns: Number of bytes/characters written
-
writelines
(lines: Iterable[Union[bytes, str]]) → None¶
-
class
xphyle.
StdWrapper
(stream: Union[IO, io.IOBase], compression: Union[bool, str] = False)¶ Bases:
xphyle.FileLikeWrapper
Wrapper around stdin/stdout/stderr.
Parameters: - stream – The stream to wrap.
- compression – Compression type.
-
closed
¶
-
xphyle.
configure
(default_xopen_context_wrapper: Optional[bool] = None, progress: Optional[bool] = None, progress_wrapper: Optional[Callable[..., Iterable]] = None, system_progress: Optional[bool] = None, system_progress_wrapper: Union[str, Sequence[str], None] = None, threads: Optional[int] = None, executable_path: Union[pathlib.PurePath, Sequence[pathlib.PurePath], None] = None) → None¶ Conifgure xphyle.
Parameters: - default_xopen_context_wrapper – Whether to wrap files opened by
:method:`xopen` in
FileLikeWrapper`s by default (when `xopen
’s context_wrapper parameter is None. - progress – Whether to wrap long-running operations with a progress bar
- progress_wrapper – Specify a non-default progress wrapper
- system_progress – Whether to use progress bars for system-level
- system_progress_wrapper – Specify a non-default system progress wrapper
- threads – The number of threads that can be used by compression formats that support parallel compression/decompression. Set to None or a number < 1 to automatically initalize to the number of cores on the local machine.
- executable_path – List of paths where xphyle should look for system executables. These will be searched before the default system path.
- default_xopen_context_wrapper – Whether to wrap files opened by
:method:`xopen` in
-
xphyle.
get_compressor
(name_or_path: Union[str, pathlib.PurePath]) → Optional[xphyle.formats.CompressionFormat]¶ Returns the CompressionFormat for the given path or compression type name.
-
xphyle.
guess_file_format
(path: pathlib.PurePath) → str¶ Try to guess the file format, first from the extension, and then from the header bytes.
Parameters: path – The path to the file Returns: The v format, or None if one could not be determined
-
xphyle.
open_
(target: Union[os.PathLike, IO, io.IOBase, bytes, str, Type[Union[bytes, str]]], mode: Union[str, xphyle.types.FileMode] = None, errors: bool = True, wrap_fileobj: bool = True, **kwargs) → Generator[[Union[IO, io.IOBase], None], None]¶ Context manager that frees you from checking if an argument is a path or a file object. Calls
xopen
to open files.Parameters: - target – A relative or absolute path, a URL, a system command, a
file-like object, or
bytes
orstr
to indicate a writeable byte/string buffer. - mode – The file open mode.
- errors – Whether to raise an error if there is a problem opening the file. If False, yields None when there is an error.
- wrap_fileobj – If path_or_file is a file-likek object, this parameter determines whether it will be passed to xopen for wrapping (True) or returned directly (False). If False, any kwargs are ignored.
- kwargs – Additional args to pass through to xopen (if
f
is a path).
Yields: A file-like object, or None if
errors
is False and there is a problem opening the file.Examples
- with open_(‘myfile’) as infile:
- print(next(infile))
fileobj = open(‘myfile’) with open_(fileobj) as infile:
print(next(infile))- target – A relative or absolute path, a URL, a system command, a
file-like object, or
-
xphyle.
popen
(args: Iterable, stdin: Union[os.PathLike, IO, io.IOBase, int, dict, Tuple[Union[os.PathLike, IO, io.IOBase, int], Union[str, xphyle.types.FileMode, dict]]] = None, stdout: Union[os.PathLike, IO, io.IOBase, int, dict, Tuple[Union[os.PathLike, IO, io.IOBase, int], Union[str, xphyle.types.FileMode, dict]]] = None, stderr: Union[os.PathLike, IO, io.IOBase, int, dict, Tuple[Union[os.PathLike, IO, io.IOBase, int], Union[str, xphyle.types.FileMode, dict]]] = None, shell: bool = False, **kwargs) → xphyle.Process¶ Opens a subprocess, using xopen to open input/output streams.
Parameters: - args – argument string or tuple of arguments.
- stdin –
- stdout –
- stderr – file to use as stdin, PIPE to open a pipe, a dict to pass xopen args for a PIPE, a tuple of (path, mode) or a tuple of (path, dict), where the dict contains parameters to pass to xopen.
- shell – The ‘shell’ arg from subprocess.Popen.
- kwargs – additional arguments to subprocess.Popen.
Returns: A Process object, which is a subclass of subprocess.Popen.
-
xphyle.
xopen
(target: Union[os.PathLike, IO, io.IOBase, bytes, str, Type[Union[bytes, str]]], mode: Union[str, xphyle.types.FileMode] = None, compression: Union[bool, str] = None, use_system: bool = True, allow_subprocesses: bool = True, context_wrapper: bool = None, file_type: xphyle.types.FileType = None, validate: bool = True, overwrite: bool = True, close_fileobj: bool = True, **kwargs) → Union[IO, io.IOBase]¶ Replacement for the builtin open function that can also open URLs and subprocessess, and automatically handles compressed files.
Parameters: - target – A relative or absolute path, a URL, a system command, a
file-like object, or
bytes
orstr
to indicate a writeable byte/string buffer. - mode – Some combination of the access mode (‘r’, ‘w’, ‘a’, or ‘x’) and the open mode (‘b’ or ‘t’). If the later is not given, ‘t’ is used by default.
- compression – If None or True, compression type (if any) will be determined automatically. If False, no attempt will be made to determine compression type. Otherwise this must specify the compression type (e.g. ‘gz’). See xphyle.compression for details. Note that compression will not be guessed for ‘-‘ (stdin).
- use_system – Whether to attempt to use system-level compression programs.
- allow_subprocesses – Whether to allow path to be a subprocess (e.g. ‘|cat’). There are security risks associated with allowing users to run arbitrary system commands.
- context_wrapper – If True, the file is wrapped in a FileLikeWrapper subclass before returning (FileWrapper for files/URLs, StdWrapper for STDIN/STDOUT/STDERR). If None, the default value (set using :method:`configure`) is used.
- file_type – a FileType; explicitly specify the file type. By default the file type is detected, but auto-detection might make mistakes, e.g. a local file contains a colon (‘:’) in the name.
- validate – Ensure that the user-specified compression format matches the format guessed from the file extension or magic bytes.
- overwrite – For files opened in write mode, whether to overwrite existing files (True).
- close_fileobj – When path is a file-like object / file_type is FileType.FILELIKE, and context_wrapper is True, whether to close the underlying file when closing the wrapper.
- kwargs – Additional keyword arguments to pass to
open
.
- path is interpreted as follows:
- If starts with ‘|’, it is assumed to be a system command
- If a file-like object, it is used as-is
- If one of STDIN, STDOUT, STDERR, the appropriate sys stream is used
- If parseable by xphyle.urls.parse_url(), it is assumed to be a URL
- If file_type == FileType.BUFFER and path is a string or bytes and mode is readable, a new StringIO/BytesIO is created with ‘path’ passed to its constructor.
- Otherwise it is assumed to be a local file
If use_system is True and the file is compressed, the file is opened with a pipe to the system-level compression program (e.g.
gzip
for ‘.gz’ files) if possible, otherwise the corresponding python library is used.Returns: A Process if file_type is PROCESS, or if file_type is None and path starts with ‘|’. Otherwise, an opened file-like object. If context_wrapper is True, this will be a subclass of FileLikeWrapper.
Raises: ValueError if – *
compression
is True and compression format cannot be determined * the specified compression format is invalid *validate
is True and the specified compression format is notthe acutal format of the file
- the path or mode are invalid
- target – A relative or absolute path, a URL, a system command, a
file-like object, or
xphyle.utils module¶
A collection of convenience methods for reading, writing, and otherwise
managing files. All of these functions are ‘safe’, meaning that if you pass
errors=False
and there is a problem opening the file, the error will be
handled gracefully.
-
class
xphyle.utils.
CompressOnClose
(**kwargs)¶ Bases:
xphyle.EventListener
Compress a file after it is closed.
-
compressed_path
= None¶
-
execute
(wrapper: xphyle.FileWrapper, **kwargs) → None¶ Handle an event. This method must be implemented by subclasses.
Parameters: - wrapper – The
EventManager
on which this event was registered. - kwargs – A union of the keyword arguments passed to the constructor and the __call__ method.
- wrapper – The
-
-
class
xphyle.utils.
CycleFileOutput
(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, char_mode: CharMode = None, **kwargs)¶ Bases:
xphyle.utils.FileOutput
Alternate each line between files.
Parameters: - files – A list of files.
- char_mode – The character mode.
-
class
xphyle.utils.
FileInput
(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, char_mode: CharMode = None)¶ Bases:
xphyle.utils.FileManager
,typing.Iterator
Similar to python’s :module:`fileinput` that uses xopen to open files. Currently only supports sequential line-oriented access via next or readline.
Parameters: - files – List of files.
- char_mode – text or binary mode.
Notes
Default values are not allowed for generically typed parameters. In a future version, char_mode will default to None and it will be required to specify the mode, or use one of the convenience methods (:method:`textinput` or :method:`byteinput`).
-
add
(path_or_file: Union[os.PathLike, IO, io.IOBase], key: Optional[Any] = None, **kwargs) → None¶ Overrides FileManager.add() to prevent file-specific open args.
-
filekey
¶ The key of the file currently being read.
-
filename
¶ The name of the file currently being read.
-
finished
¶ Whether all data has been read from all files.
-
lineno
¶ The total number of lines that have been read so far from all files.
-
readline
() → CharMode¶ Read the next line from the current file (advancing to the next file if necessary and possible).
Returns: The next line, or the undefined string if self.finished==True.
-
class
xphyle.utils.
FileManager
(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, header=None, **kwargs)¶ Bases:
collections.abc.Sized
Dict-like container for files. Files are opened lazily (upon first request) using xopen.
Parameters: - files – An iterable of files to add. Each item can either be a string path or a (key, fileobj) tuple.
- header – A header to write when opening writable files.
- kwargs – Default arguments to pass to xopen.
-
add
(path_or_file: Union[os.PathLike, IO, io.IOBase], key: Optional[Any] = None, **kwargs) → None¶ Add a file.
Parameters: - path_or_file – Path or file object. If this is a path, the file will be opened with the specified mode.
- key – Dict key. Defaults to the file name.
- kwargs – Arguments to pass to xopen. These override any keyword arguments passed to the FileManager’s constructor.
-
add_all
(files: Union[Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], Dict[Any, Union[os.PathLike, IO, io.IOBase]]], **kwargs) → None¶ Add all files from an iterable or dict.
Parameters: - files – An iterable or dict of files to add. If an iterable, each item can either be a string path or a (key, fileobj) tuple.
- kwargs – Additional arguments to pass to add.
-
close
() → None¶ Close all files being tracked.
-
get
(key: Any) → Union[IO, io.IOBase, None]¶ Get the file object associated with a path. If the file is not already open, it is first opened with xopen.
Parameters: key – The file name/key. Returns: The opened file.
-
get_path
(key: Any) → pathlib.PurePath¶ Returns the file path associated with a key.
Parameters: key – The key to resolve. Returns: The file path.
-
iter_files
() → Generator[[Tuple[Any, Union[IO, io.IOBase]], None], None]¶ Iterates over all (key, file) pairs in the order they were added.
-
keys
¶ Returns a list of all keys in the order they were added.
-
paths
¶ Returns a list of all paths in the order they were added.
-
xphyle.utils.
FileOrFilesArg
= typing.Union[os.PathLike, typing.IO, io.IOBase, typing.Iterable[typing.Union[os.PathLike, typing.IO, io.IOBase, typing.Tuple[typing.Any, typing.Union[os.PathLike, typing.IO, io.IOBase]]]], NoneType]¶ A path or multiple files.
-
class
xphyle.utils.
FileOutput
(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, access: Union[str, xphyle.types.ModeAccess] = 'w', char_mode: Optional[CharMode] = None, linesep: Optional[CharMode] = None, encoding: str = 'utf-8', header: Optional[CharMode] = None)¶ Bases:
xphyle.utils.FileManager
,typing.Generic
Base class for file manager that writes to multiple files.
Parameters: - files – The list of files to open.
- char_mode – The CharMode.
- access – How to open the output files (‘w’, ‘a’, ‘x’).
- linesep – The line separator (type must match char_mode).
- encoding – Default character encoding to use.
- header – Default file header to write when opening output files.
Notes
Default values for generically typed parameters are not allowed. In a future version, char_mode and linesep will default to None and must be explicitly defined.
-
write
(data: Any, detect_newlines: bool = True) → int¶ Writes data to the output.
Parameters: - data – The data to write; will be converted to string/bytes.
- detect_newlines – If True, data is split on
linesep
and the resulting lines are written using :method:`writelines`, otherwise data is writen using :method:`writeline`.
Returns: The number of characters written.
-
writeline
(line: Union[bytes, str, None] = None) → Tuple[int, int]¶ Write a line to the output(s).
Parameters: line – The line to write. Returns: The tuple (lines_written, chars_written).
-
writelines
(lines: Iterable[Union[bytes, str]]) → Tuple[int, int]¶ Write an iterable of lines to the output(s).
Parameters: lines – An iterable of lines to write. Returns: The tuple (lines_written, chars_written).
-
xphyle.utils.
FilesArg
¶ alias of
typing.Iterable
-
class
xphyle.utils.
MoveOnClose
(**kwargs)¶ Bases:
xphyle.EventListener
Move a file after it is closed.
-
execute
(wrapper: xphyle.FileWrapper, dest: pathlib.PurePath = None, **kwargs) → None¶ Handle an event. This method must be implemented by subclasses.
Parameters: - wrapper – The
EventManager
on which this event was registered. - kwargs – A union of the keyword arguments passed to the constructor and the __call__ method.
- wrapper – The
-
-
class
xphyle.utils.
NCycleFileOutput
(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, char_mode: CharMode = None, lines_per_file: int = 1, **kwargs)¶ Bases:
xphyle.utils.FileOutput
Alternate output lines between files.
Parameters: - files – A list of files.
- char_mode – The character mode.
- num_lines – How many lines to write to a file before moving on to the next file.
-
class
xphyle.utils.
PatternFileOutput
(filename_pattern: Optional[str] = None, char_mode: Optional[CharMode] = None, token_func: Callable[Union[bytes, str], Dict[Union[bytes, str], Any]] = <function PatternFileOutput.<lambda>>, **kwargs)¶ Bases:
xphyle.utils.TokenFileOutput
Use a callable to generate filenames based on data in lines.
Parameters: - filename_pattern – The pattern of file names to create. Should have a single token (‘{}’ or ‘{0}’) that is replaced with the file index.
- char_mode – The character mode.
- token_func – Function to extract token(s) from lines in file. By default this is the identity function, which is almost never what you want.
- kwargs – Additional args.
-
xphyle.utils.
PatternOrFileOrFilesArg
= typing.Union[str, os.PathLike, typing.IO, io.IOBase, typing.Iterable[typing.Union[os.PathLike, typing.IO, io.IOBase, typing.Tuple[typing.Any, typing.Union[os.PathLike, typing.IO, io.IOBase]]]], NoneType]¶ A pattern, path, file, or multiple files.
-
class
xphyle.utils.
RemoveOnClose
(**kwargs)¶ Bases:
xphyle.EventListener
Remove a file after it is closed.
-
execute
(wrapper: xphyle.FileWrapper, **kwargs) → None¶ Handle an event. This method must be implemented by subclasses.
Parameters: - wrapper – The
EventManager
on which this event was registered. - kwargs – A union of the keyword arguments passed to the constructor and the __call__ method.
- wrapper – The
-
-
class
xphyle.utils.
RollingFileOutput
(filename_pattern: Union[str, Iterable[str]] = None, char_mode: CharMode = None, lines_per_file: int = 1, **kwargs)¶ Bases:
xphyle.utils.TokenFileOutput
Write up to
num_lines
lines to a file before opening the next file. File names are created from a pattern.Parameters: - filename_pattern – The pattern of file names to create. Should have a single token (‘{}’ or ‘{0}’) that is replaced with the file index.
- char_mode – The character mode.
- num_lines – The max number of lines to write to each file.
- kwargs – Additional args.
-
class
xphyle.utils.
TeeFileOutput
(files: Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]] = None, access: Union[str, xphyle.types.ModeAccess] = 'w', char_mode: Optional[CharMode] = None, linesep: Optional[CharMode] = None, encoding: str = 'utf-8', header: Optional[CharMode] = None)¶ Bases:
xphyle.utils.FileOutput
Write output to mutliple files simultaneously.
-
class
xphyle.utils.
TokenFileOutput
(filename_pattern: Optional[str] = None, char_mode: Optional[CharMode] = None, **kwargs)¶ Bases:
xphyle.utils.FileOutput
Generate file names according to a pattern.
Parameters: - filename_pattern – The pattern of file names to create. Should have a single token (‘{}’ or ‘{0}’) that is replaced with the file index.
- char_mode – The character mode.
- kwargs – Additional args.
-
xphyle.utils.
byteinput
(files: Union[os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None)¶ Convenience method that creates a new
FileInput
in bytes mode.Parameters: files – The files to open. If None, files passed on the command line are used, or STDIN if there are no command line arguments. Returns: A FileInput[bytes] instance.
-
xphyle.utils.
byteoutput
(files: Union[os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None, file_output_type: Callable[..., xphyle.utils.FileOutput[bytes]] = xphyle.utils.TeeFileOutput[bytes], **kwargs) → xphyle.utils.FileOutput[bytes]¶ Convenience function to create a fileoutput in bytes mode.
Parameters: - files – The files to write to.
- file_output_type – The specific subclass of FileOutput to create.
- kwargs – additional arguments to pass to the FileOutput constructor.
Returns: A FileOutput instance.
-
xphyle.utils.
compress_file
(source_file: Union[os.PathLike, IO, io.IOBase], compressed_file: Union[os.PathLike, IO, io.IOBase] = None, compression: Union[bool, str] = None, keep: bool = True, compresslevel: int = None, use_system: bool = True, **kwargs) → pathlib.Path¶ Compress an existing file, either in-place or to a separate file.
Parameters: - source_file – Path or file-like object to compress.
- compressed_file – The compressed path or file-like object. If None,
compression is performed in-place. If True, file name is determined
from
source_file
and the decompressed file is retained. - compression – If True, guess compression format from the file name, otherwise the name of any supported compression format.
- keep – Whether to keep the source file.
- compresslevel – Compression level.
- use_system – Whether to try to use system-level compression.
- kwargs – Additional arguments to pass to the open method when opening the compressed file.
Returns: The path to the compressed file.
-
xphyle.utils.
decompress_file
(compressed_file: Union[os.PathLike, IO, io.IOBase], dest_file: Union[os.PathLike, IO, io.IOBase] = None, compression: Union[bool, str] = None, keep: bool = True, use_system: bool = True, **kwargs) → pathlib.Path¶ Decompress an existing file, either in-place or to a separate file.
Parameters: - compressed_file – Path or file-like object to decompress.
- dest_file – Path or file-like object for the decompressed file. If None, file will be decompressed in-place. If True, file will be decompressed to a new file (and the compressed file retained) whose name is determined automatically.
- compression – None or True, to guess compression format from the file name, or the name of any supported compression format.
- keep – Whether to keep the source file.
- use_system – Whether to try to use system-level compression
- kwargs – Additional arguments to pass to the open method when opening the compressed file.
Returns: The path of the decompressed file.
-
xphyle.utils.
exec_process
(*args, inp: Union[bytes, str] = None, timeout: int = None, **kwargs) → xphyle.Process¶ Shortcut to execute a process, wait for it to terminate, and return the results.
Parameters: - args – Positional arguments to popen.
- inp – String/bytes to write to process input stream.
- timeout – Time to wait for process to complete.
- kwargs – Keyword arguments to popen.
Returns: A terminated
Process
. The contents of stdout and stderr are recorded in the stdout and stderr attributes.
-
xphyle.utils.
fileinput
(files: Union[os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None, char_mode: CharMode = None) → xphyle.utils.FileInput[CharMode]¶ Convenience method that creates a new
FileInput
.Parameters: - files – The files to open. If None, files passed on the command line are used, or STDIN if there are no command line arguments.
- char_mode – The default read mode (‘t’ for text or b’b’ for binary).
Returns: A FileInput instance.
Notes
Default values are not allowed for generically typed parameters. Use :method:`textinput` or :method:`byteinput` instead.
-
xphyle.utils.
fileoutput
(files: Union[str, os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None, char_mode: CharMode = None, linesep: CharMode = None, encoding: str = 'utf-8', file_output_type: Callable[..., xphyle.utils.FileOutput[CharMode]] = xphyle.utils.TeeFileOutput[~CharMode], **kwargs) → xphyle.utils.FileOutput[CharMode]¶ Convenience function to create a fileoutput.
Parameters: - files – The files to write to. Can include ‘-‘/’_’ for stdout/stderr.
- char_mode – The write mode (‘t’ or b’b’).
- linesep – The separator to use when writing lines.
- encoding – The default file encoding to use.
- file_output_type – The specific subclass of FileOutput to create.
- kwargs – additional arguments to pass to the FileOutput constructor.
Returns: A FileOutput instance.
Notes
Default values are not allowed for generically typed parameters. Use :method:`textoutput` or :method:`byteoutput` instead.
-
xphyle.utils.
linecount
(path_or_file: Union[os.PathLike, IO, io.IOBase], linesep: Optional[bytes] = None, buffer_size: int = 1048576, **kwargs) → int¶ Fastest pythonic way to count the lines in a file.
Parameters: - path_or_file – File object, or path to the file.
- linesep – Line delimiter, specified as a byte string (e.g. b’n’).
- buffer_size – How many bytes to read at a time (1 Mb by default).
- kwargs – Additional arguments to pass to the file open method.
Returns: The number of lines in the file. Blank lines (including the last line in the file) are included.
-
xphyle.utils.
read_bytes
(path_or_file: Union[os.PathLike, IO, io.IOBase], chunksize: int = 1024, **kwargs) → Generator[[bytes, None], None]¶ Iterate over a file in chunks. The mode will always be overridden to ‘rb’.
Parameters: - path_or_file – Path to the file, or a file-like object.
- chunksize – Number of bytes to read at a time.
- kwargs – Additional arguments to pass top :method:`xphyle.open_`.
Yields: Chunks of the input file as bytes. Each chunk except the last should be of size chunksize.
-
xphyle.utils.
read_delimited
()¶ Iterate over rows in a delimited file.
Parameters: - path_or_file – Path to the file, or a file-like object.
- sep – The field delimiter.
- header – Either True or False to specifiy whether the file has a header, or a sequence of column names.
- converters – callable, or iterable of callables, to call on each value.
- yield_header – If header == True, whether the first row yielded should be the header row.
- row_type – The collection type to return for each row: tuple, list, or dict.
- kwargs – additional arguments to pass to csv.reader.
Yields: Rows of the delimited file. If header==True, the first row yielded is the header row, and its type is always a list. Converters are not applied to the header row.
-
xphyle.utils.
read_delimited_as_dict
(path_or_file: Union[os.PathLike, IO, io.IOBase], sep: str = '\t', header: Union[bool, Sequence[str]] = False, key: Union[int, str, Callable[Sequence[str], Any]] = 0, **kwargs) → Dict[Any, Any]¶ Parse rows in a delimited file and add rows to a dict based on a a specified key index or function.
Parameters: - path_or_file – Path to the file, or a file-like object.
- sep – Field delimiter.
- header – If True, read the header from the first line of the file, otherwise a list of column names.
- key – The column to use as a dict key, or a function to extract the key from the row. If a string value, header must be specified. All values must be unique, or an exception is raised.
- kwargs – Additional arguments to pass to read_delimited.
Returns: A dict with as many element as rows in the file.
Raises: Exception if a duplicte key is generated.
-
xphyle.utils.
read_dict
(path_or_file: Union[os.PathLike, IO, io.IOBase], sep: str = '=', convert: Optional[Callable[str, Any]] = None, ordered: bool = False, **kwargs) → Dict[str, Any]¶ Read lines from simple property file (key=value). Comment lines (starting with ‘#’) are ignored.
Parameters: - path_or_file – Property file, or a list of properties.
- sep – Key-value delimiter (defaults to ‘=’).
- convert – Function to call on each value.
- ordered – Whether to return an OrderedDict.
- kwargs – Additional arguments to pass top :method:`xphyle.open_.
Returns: An OrderedDict, if ‘ordered’ is True, otherwise a dict.
-
xphyle.utils.
read_lines
(path_or_file: Union[os.PathLike, IO, io.IOBase], convert: Optional[Callable[str, Any]] = None, strip_linesep: bool = True, **kwargs) → Generator[[str, None], None]¶ Iterate over lines in a file.
Parameters: - path_or_file – Path to the file, or a file-like object.
- convert – Function to call on each line in the file.
- strip_linesep – Whether to strip off trailing line separators.
- kwargs – Additional arguments to pass to :method:`xphyle.open_`.
Yields: Lines of a file, with line endings stripped.
-
xphyle.utils.
textinput
(files: Union[os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None)¶ Convenience method that creates a new
FileInput
in text mode.Parameters: files – The files to open. If None, files passed on the command line are used, or STDIN if there are no command line arguments. Returns: A FileInput[Text] instance.
-
xphyle.utils.
textoutput
(files: Union[os.PathLike, IO, io.IOBase, Iterable[Union[os.PathLike, IO, io.IOBase, Tuple[Any, Union[os.PathLike, IO, io.IOBase]]]], None] = None, file_output_type: Callable[..., xphyle.utils.FileOutput[str]] = xphyle.utils.TeeFileOutput[str], **kwargs) → xphyle.utils.FileOutput[str]¶ Convenience function to create a fileoutput in text mode.
Parameters: - files – The files to write to.
- file_output_type – The specific subclass of FileOutput to create.
- kwargs – additional arguments to pass to the FileOutput constructor.
Returns: A FileOutput instance.
-
xphyle.utils.
to_bytes
(value: Any, encoding: str = 'utf-8')¶ Convert an arbitrary value to bytes.
Parameters: - value – Some value.
- encoding – The byte encoding to use.
Returns: x converted to a string and then encoded as bytes.
-
xphyle.utils.
transcode_file
(source_file: Union[os.PathLike, IO, io.IOBase], dest_file: Union[os.PathLike, IO, io.IOBase], source_compression: Union[bool, str] = True, dest_compression: Union[bool, str] = True, use_system: bool = True, source_open_args: Optional[dict] = None, dest_open_args: Optional[dict] = None) → None¶ Convert from one file format to another.
Parameters: - source_file – The path or file-like object to read from. If a file, it must be opened in mode ‘rb’.
- dest_file – The path or file-like object to write to. If a file, it must be opened in binary mode.
- source_compression – The compression type of the source file. If True, guess compression format from the file name, otherwise the name of any supported compression format.
- dest_compression – The compression type of the dest file. If True, guess compression format from the file name, otherwise the name of any supported compression format.
- use_system – Whether to use system-level compression.
- source_open_args – Additional arguments to pass to xopen for the source file.
- dest_open_args – Additional arguments to pass to xopen for the destination file.
-
xphyle.utils.
uncompressed_size
(path: pathlib.PurePath, compression: Union[bool, str] = None) → Optional[int]¶ Get the uncompressed size of the compressed file.
Parameters: - path – The path to the compressed file.
- compression – None or True, to guess compression format from the file name, or the name of any supported compression format.
Returns: The uncompressed size of the file in bytes, or None if the uncompressed size could not be determined (without actually decompressing the file).
Raises: ValueError if the compression format is not supported.
-
xphyle.utils.
write_bytes
(iterable: Iterable, path_or_file: Union[os.PathLike, IO, io.IOBase], sep: Optional[bytes] = b'', convert: Callable[Any, bytes] = <function to_bytes>, **kwargs) → int¶ Write an iterable of bytes to a file.
Parameters: - iterable – An iterable.
- path_or_file – Path to the file, or a file-like object.
- sep – Separator between items.
- convert – Function that converts a value to bytes.
- kwargs – Additional arguments to pass top :method:`xphyle.open_`.
Returns: Total number of bytes written, or -1 if
errors=False
and there was a problem opening the file.
-
xphyle.utils.
write_dict
(dictobj: Dict[str, Any], path_or_file: Union[os.PathLike, IO, io.IOBase], sep: str = '=', linesep: Optional[str] = '\n', convert: Callable[Any, str] = <class 'str'>, **kwargs) → int¶ Write a dict to a file as name=value lines.
Parameters: - dictobj – The dict (or dict-like object).
- path_or_file – Path to the file, or a file-like object.
- sep – The delimiter between key and value (defaults to ‘=’).
- linesep – The delimiter between values, or
os.linesep
if None (defaults to ‘n’). - convert – Function that converts a value to a string.
Returns: Total number of bytes written, or -1 if
errors=False
and there was a problem opening the file.
-
xphyle.utils.
write_lines
(iterable: Iterable[str], path_or_file: Union[os.PathLike, IO, io.IOBase], linesep: Optional[str] = '\n', convert: Callable[Any, str] = <class 'str'>, **kwargs) → int¶ Write delimiter-separated strings to a file.
Parameters: - iterable – An iterable.
- path_or_file – Path to the file, or a file-like object.
- linesep – The delimiter to use to separate the strings, or os.linesep if None (defaults to ‘n’).
- convert – Function that converts a value to a string.
- kwargs – Additional arguments to pass top :method:`xphyle.open_`.
Returns: Total number of bytes written, or -1 if errors=False and there was a problem opening the file.
xphyle.paths module¶
Convenience functions for working with file paths.
Stdin, stdout, and stderr are treated as acceptable paths in most cases, which is why the PurePath type (Union[str, os.PurePath]) is used. String paths are still accepted as inputs, but all outputs will subclasses of os.PurePath.
-
xphyle.paths.
BACKCOMPAT
= True¶ Whether backward compatibility is enabled. By default, backward compatibility is enabled unless environment variable XPHYLE_BACKCOMPAT is set to ‘0’.
-
class
xphyle.paths.
DirSpec
(*path_vars, template: str = None, pattern: Union[str, Pattern[~AnyStr]] = None)¶ Bases:
xphyle.paths.SpecBase
Spec for the directory part of a path.
-
default_pattern
¶ The default filename pattern.
-
default_search_root
() → pathlib.PurePath¶ Get the default root directory for searcing.
-
default_var_name
¶ The default variable name used for string formatting.
-
path_part
(path: pathlib.Path) → str¶ Return the part of the absolute path corresponding to the spec type.
-
path_type
¶ The PathType.
-
-
xphyle.paths.
EXECUTABLE_CACHE
= <xphyle.paths.ExecutableCache object>¶ Singleton instance of ExecutableCache.
-
class
xphyle.paths.
ExecutableCache
(default_path: Optional[Iterable[pathlib.PurePath]] = None)¶ Bases:
object
Lookup and cache executable paths.
Parameters: default_path – The default executable path -
add_search_path
(paths: Union[str, pathlib.PurePath, Iterable[pathlib.PurePath]]) → None¶ Add directories to the beginning of the executable search path.
Parameters: paths – List of paths, or a string with directories separated by os.pathsep.
-
get_path
(executable: Union[str, pathlib.PurePath]) → pathlib.Path¶ Get the full path of executable.
Parameters: executable – A executable name or path. Returns: The full path of executable, or None if the path cannot be found.
-
reset_search_path
(default_path: Iterable[pathlib.PurePath] = None) → None¶ Reset the search path to default_path.
Parameters: default_path – The default executable path.
-
resolve_exe
(names: Iterable[str]) → Optional[Tuple[pathlib.Path, str]]¶ Given an iterable of command names, find the first that resolves to an executable.
Parameters: names – An iterable of command names. Returns: A tuple (path, name) of the first command to resolve, or None if none of the commands resolve.
-
-
class
xphyle.paths.
FileSpec
(*path_vars, template: str = None, pattern: Union[str, Pattern[~AnyStr]] = None)¶ Bases:
xphyle.paths.SpecBase
Spec for the filename part of a path.
Examples
- spec = FileSpec(
- PathVar(‘id’, pattern=’[A-Z0-9_]+’), PathVar(‘ext’, pattern=r’[^.]+’), template=’{id}.{ext}’
)
# get a single file path = spec(id=’ABC123’, ext=’txt’) # => PathInst(‘ABC123.txt’) print(path[‘id’]) # => ‘ABC123’
# get the variable values for a path path = spec.parse(‘ABC123.txt’) print(path[‘id’]) # => ‘ABC123’
# find all files that match a FileSpec in the user’s home directory all_paths = spec.find(‘~’) # => [PathInst…]
-
default_pattern
¶ The default filename pattern.
-
default_var_name
¶ The default variable name used for string formatting.
-
path_part
(path: pathlib.Path) → str¶ Return the part of the absolute path corresponding to the spec type.
-
path_type
¶ The PathType.
-
class
xphyle.paths.
PathInst
¶ Bases:
pathlib.PosixPath
A path-like that has a slot for variable values.
-
joinpath
(*other) → xphyle.paths.PathInst¶ Join two path-like objects, including merging ‘values’ dicts.
-
values
¶
-
-
class
xphyle.paths.
PathPathVar
(name: str, undefined: pathlib.PurePath = PosixPath('.'), datatype: Callable[str, pathlib.Path] = <class 'pathlib.Path'>, **kwargs)¶ Bases:
xphyle.paths.PathVar
-
class
xphyle.paths.
PathSpec
(dir_spec: Union[pathlib.PurePath, xphyle.paths.DirSpec], file_spec: Union[str, xphyle.paths.FileSpec])¶ Bases:
object
Specifies a path in terms of a template with named components (“path variables”).
Parameters: - dir_spec – A PurePath if the directory is fixed, otherwise a DirSpec.
- file_spec – A string if the filename is fixed, otherwise a FileSpec.
-
construct
(**kwargs) → xphyle.paths.PathInst¶ Create a new PathInst from this PathSpec using values in kwargs.
Parameters: kwargs – Specify values for path variables. Returns: A PathInst
-
find
(root: Optional[pathlib.PurePath] = None, path_types: Sequence[Union[str, xphyle.types.PathType]] = 'f', recursive: bool = False) → Sequence[xphyle.paths.PathInst]¶ Find all paths matching this PathSpec. The search starts in ‘root’ if it is not None, otherwise it starts in the deepest fixed directory of this PathSpec’s DirSpec.
Parameters: - root – Directory in which to begin the search.
- path_types – Types to return – files (‘f’), directories (‘d’) or both (‘fd’).
- recursive – Whether to search recursively.
Returns: A sequence of PathInst.
-
parse
(path: pathlib.PurePath) → xphyle.paths.PathInst¶ Extract PathVar values from path and create a new PathInst.
Parameters: path – The path to parse Returns: a PathInst
-
class
xphyle.paths.
PathVar
(name: str, optional: bool = False, default: Optional[T] = None, undefined: T = None, pattern: Union[str, Pattern[~AnyStr]] = None, valid: Iterable[T] = None, invalid: Iterable[T] = None, datatype: Callable[str, T] = None)¶ Bases:
typing.Generic
Describes part of a path, used in PathSpec.
Parameters: - name – Path variable name
- optional – Whether this part of the path is optional
- default – A default value for this path variable
- undefined – The value to use when the variable is undefined
- pattern – A pattern that the value must match
- valid – Iterable of valid values
- invalid – Iterable of invalid values
If valid is specified, invalid and pattern are ignored. Otherwise, values are first checked against pattern (if one is specified), then checked against invalid (if specified).
-
as_pattern
() → str¶ Format this variable as a regular expression capture group.
-
xphyle.paths.
STDERR
= PurePosixPath('/dev/stderr')¶ Placeholder for sys.stderr
-
xphyle.paths.
STDERR_STR
= '_'¶ String placeholder for stderr.
-
xphyle.paths.
STDIN
= PurePosixPath('/dev/stdin')¶ Placeholder for sys.stdin.
-
xphyle.paths.
STDIN_OR_STDOUT
= PurePosixPath('-')¶ Placeholder for stdin or stdout, when the access mode is not known.
-
xphyle.paths.
STDIN_OR_STDOUT_STR
= '-'¶ String placeholder for stdin/stdout.
-
xphyle.paths.
STDOUT
= PurePosixPath('/dev/stdout')¶ Placeholder for or sys.stdout.
-
class
xphyle.paths.
SpecBase
(*path_vars, template: str = None, pattern: Union[str, Pattern[~AnyStr]] = None)¶ Bases:
object
Base class for
DirSpec
andFileSpec
.Parameters: - path_vars – Named variables with which to associate parts of a path.
- template – Format string for creating paths from variables.
- pattern – Regular expression for identifying matching paths.
-
construct
(**kwargs) → xphyle.paths.PathInst¶ Create a new PathInst from this spec using values in kwargs.
Parameters: kwargs – Specify values for path variables. Returns: A PathInst.
-
default_pattern
¶ The default filename pattern.
-
default_search_root
() → pathlib.PurePath¶ Get the default root directory for searcing.
-
default_var_name
¶ The default variable name used for string formatting.
-
find
(root: Optional[pathlib.PurePath] = None, recursive: bool = False) → Sequence[xphyle.paths.PathInst]¶ Find all paths in root matching this spec.
Parameters: - root – Directory in which to begin the search.
- recursive – Whether to search recursively.
Returns: A sequence of PathInst.
-
parse
(path: Union[str, pathlib.PurePath], fullpath: bool = False) → xphyle.paths.PathInst¶ Extract PathVar values from path and create a new PathInst.
Parameters: - path – The path to parse.
- fullpath – Whether to extract the fully-resolved path.
Returns: a PathInst.
-
path_part
(path: pathlib.Path) → str¶ Return the part of the absolute path corresponding to the spec type.
-
path_type
¶ The PathType.
-
class
xphyle.paths.
StrPathVar
(name: str, undefined: str = '', **kwargs)¶ Bases:
xphyle.paths.PathVar
-
class
xphyle.paths.
TempDir
(permissions: Union[xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]], None] = 'rwx', path_descriptors: Iterable[xphyle.paths.TempPathDescriptor] = None, **kwargs)¶ Bases:
xphyle.paths.TempPathManager
,xphyle.paths.TempPath
Context manager that creates a temporary directory and cleans it up upon exit.
Parameters: - mode – Access mode to set on temp directory. All subdirectories and files will inherit this mode unless explicity set to be different.
- path_descriptors – Iterable of TempPathDescriptors.
- kwargs – Additional arguments passed to tempfile.mkdtemp.
By default all subdirectories and files inherit the mode of the temporary directory. If TempPathDescriptors are specified, the paths are created before permissions are set, enabling creation of a read-only temporary file system.
-
absolute_path
¶ The absolute path.
-
close
() → None¶ Delete the temporary directory and all files/subdirectories within.
-
make_directory
(desc: xphyle.paths.TempPathDescriptor = None, apply_permissions: bool = True, **kwargs) → pathlib.Path¶ Convenience method; calls make_path with path_type=’d’.
-
make_empty_files
(num_files: int, **kwargs) → Sequence[pathlib.Path]¶ Create randomly-named undefined files.
Parameters: - num_files – The number of files to create.
- kwargs – Arguments to pass to TempPathDescriptor.
Returns: A sequence of paths.
-
make_fifo
(desc: xphyle.paths.TempPathDescriptor = None, apply_permissions: bool = True, **kwargs) → pathlib.Path¶ Convenience method; calls make_path with path_type=’|’.
-
make_file
(desc: xphyle.paths.TempPathDescriptor = None, apply_permissions: bool = True, **kwargs) → pathlib.Path¶ Convenience method; calls make_path with path_type=’f’.
-
make_path
(desc: xphyle.paths.TempPathDescriptor = None, apply_permissions: bool = True, **kwargs) → pathlib.Path¶ Create a file or directory within the TempDir.
Parameters: - desc – A TempPathDescriptor.
- apply_permissions – Whether permissions should be applied to the new file/directory.
- kwargs – Arguments to TempPathDescriptor. Ignored unless desc is None.
Returns: The absolute path to the new file/directory.
-
make_paths
(*path_descriptors) → Sequence[pathlib.Path]¶ Create multiple files/directories at once. The paths are created before permissions are set, enabling creation of a read-only temporary file system.
Parameters: path_descriptors – One or more TempPathDescriptor. Returns: A list of the created paths.
-
relative_path
¶ The relative path.
-
class
xphyle.paths.
TempPath
(parent: Union[pathlib.Path, TempPath] = None, permissions: Union[xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]], None] = 'rwx', path_type: Union[str, xphyle.types.PathType] = 'd', root: Optional[TempPathManager] = None)¶ Bases:
object
Base class for temporary files/directories.
Parameters: - parent – The parent directory.
- permissions – The access permissions.
- path_type – ‘f’ = file, ‘d’ = directory.
-
absolute_path
¶ The absolute path.
-
exists
¶ Whether the directory exists.
-
permissions
¶ The permissions of the path. Defaults to the parent’s mode.
-
relative_path
¶ The relative path.
-
set_permissions
(permissions: Union[xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]], None] = None, set_parent: bool = False, additive: bool = False) → Optional[xphyle.types.PermissionSet]¶ Set the permissions for the path.
Parameters: - permissions – The new flags to set. If None, the existing flags are used.
- set_parent – Whether to recursively set the permissions of all parents. This is done additively.
- additive – Whether permissions should be additive (e.g. if permissions == ‘w’ and self.permissions == ‘r’, the new mode is ‘rw’).
Returns: The PermissionSet representing the flags that were set.
-
class
xphyle.paths.
TempPathDescriptor
(name: str = None, parent: Union[pathlib.PurePath, xphyle.paths.TempPath, None] = None, permissions: Union[xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]], None] = None, suffix: str = '', prefix: str = '', contents: str = '', path_type: Union[str, xphyle.types.PathType] = 'f', root: Optional[TempPathManager] = None)¶ Bases:
xphyle.paths.TempPath
Describes a temporary file or directory within a TempDir.
Parameters: - name – The file/directory name.
- parent – The parent directory, a TempPathDescriptor.
- permissions – The permissions mode.
- prefix (suffix,) – The suffix and prefix to use when calling mkstemp or mkdtemp.
- path_type – ‘f’ (for file), ‘d’ (for directory), or ‘|’ (for FIFO).
-
absolute_path
¶ The absolute path.
-
create
(apply_permissions: bool = True) → None¶ Create the file/directory.
Parameters: apply_permissions – Whether to set permissions according to self.permissions.
-
relative_path
¶ The relative path.
-
class
xphyle.paths.
TempPathManager
¶ Bases:
object
Base for classes that manage mapping between paths and TempPathDescriptors.
-
clear
()¶
-
-
xphyle.paths.
abspath
(path: pathlib.PurePath) → pathlib.PurePath¶ Returns the fully resolved path associated with path.
Parameters: path – Relative or absolute path Returns: A PurePath - typically a pathlib.Path, but may be STDOUT or STDERR. Examples
abspath(‘foo’) # -> /path/to/curdir/foo abspath(‘~/foo’) # -> /home/curuser/foo
-
xphyle.paths.
as_path
(path: Union[str, pathlib.PurePath], access: Union[str, xphyle.types.ModeAccess, None] = None) → pathlib.Path¶ Convert a string to a Path. Note that trying to use STDIN/STDOUT/STDERR as actual paths on Windows will result in an error.
Parameters: - path – String to convert. May be a string path, a stdin/stdout/stderr placeholder, or file:// URL. If it is already a Path, it is returned without modification.
- access – The file access mode, to disambiguate stdin/stdout when path is the placeholder (‘-‘).
Returns: A Path instance.
Raises: ValueError if ‘path’ is a stdin/stdout placeholder and ‘access’ is None.
-
xphyle.paths.
as_pure_path
(path: Union[str, pathlib.PurePath], access: Union[str, xphyle.types.ModeAccess, None] = None) → pathlib.PurePath¶ Convert a string to a PurePath.
Parameters: - path – String to convert. May be a string path, a stdin/stdout/stderr placeholder, or file:// URL. If it is already a PurePath, it is returned without modification.
- access – The file access mode, to disambiguate stdin/stdout when path is the placeholder (‘-‘).
Returns: A PurePath instance. Except with ‘path’ is a PurePath or stdin/stdout/stderr placeholder, the actual return type is a Path instance.
-
xphyle.paths.
check_access
(path: pathlib.PurePath, permissions: Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess, xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]]]) → xphyle.types.PermissionSet¶ Check that path is accessible with the given set of permissions.
Parameters: - path – The path to check.
- permissions – Access specifier (string/int/
ModeAccess
).
Raises: IOError if the path cannot be accessed according to permissions.
-
xphyle.paths.
check_path
(path: pathlib.PurePath, path_type: Union[str, xphyle.types.PathType] = None, permissions: Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess, xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]]] = None) → pathlib.PurePath¶ Resolves the path (using resolve_path) and checks that the path is of the specified type and allows the specified access.
Parameters: - path – The path to check.
- path_type – A string or
PathType
(‘f’ or ‘d’). - permissions – Access flag (string, int, Permission, or PermissionSet).
Returns: The fully resolved path.
Raises: - IOError if the path does not exist, is not of the specified type,
- or doesn’t allow the specified access.
-
xphyle.paths.
check_readable_file
(path: pathlib.PurePath) → pathlib.PurePath¶ Check that path exists and is readable.
Parameters: path – The path to check Returns: The fully resolved path of path
-
xphyle.paths.
check_std
(path: pathlib.PurePath, error: bool = False) → bool¶ Check whether the path is ‘-‘ (stdout) or ‘_’ (stderr).
Parameters: - path – The path to check.
- error – Whether an error should be raised if path is stdout or stderr.
Returns: True if path is stdout or stderr.
Raises: ValueError if path is stdout or stderr and error is True.
-
xphyle.paths.
check_writable_file
(path: pathlib.PurePath, mkdirs: bool = True) → pathlib.PurePath¶ If path exists, check that it is writable, otherwise check that its parent directory exists and is writable.
Parameters: - path – The path to check.
- mkdirs – Whether to create any missing directories (True).
Returns: The fully resolved path.
-
xphyle.paths.
convert_std_placeholder
(path: str, access: Union[str, xphyle.types.FileMode, xphyle.types.ModeAccess, None] = None) → Union[str, pathlib.PurePath]¶
-
xphyle.paths.
deprecated
(msg: str)¶ Issue a deprecation warning:
Parameters: msg – The warning message to display.
-
xphyle.paths.
deprecated_str_to_path
(*args_to_convert, list_args: Optional[Sequence[Union[int, str]]] = None, dict_args: Optional[Sequence[Union[int, str]]] = None) → Callable¶ Decorator for a function that used to take paths as strings and now only takes them as os.PurePath objects. A deprecation warning is issued, and the string arguments are converted to paths before calling the function.
Backward compatibility can be disabled by the XPHYLE_BACKCOMPAT environment variable. If set to false (0), the func is returned immediately.
-
xphyle.paths.
filename
(path: pathlib.PurePath) → str¶ Equivalent to split_path(path)[1].
Parameters: path (The) – Returns: The filename part of path (without any extensions).
-
xphyle.paths.
find
(root: pathlib.PurePath, pattern: Union[str, Pattern[~AnyStr]], path_types: Sequence[Union[str, xphyle.types.PathType]] = 'f', recursive: bool = True, return_matches: bool = False) → Union[Sequence[pathlib.PurePath], Sequence[Tuple[pathlib.PurePath, Match[~AnyStr]]]]¶ Find all paths under root that match pattern.
Parameters: - root – Directory at which to start search.
- pattern – File name pattern to match (string or re object).
- path_types – Types to return – files (‘f’), directories (‘d’ or both (‘fd’).
- recursive – Whether to search directories recursively.
- return_matches – Whether to return regular expression match for each file.
Returns: List of matching paths. If return_matches is True, each item will be a (path, Match) tuple.
-
xphyle.paths.
get_permissions
(path: pathlib.PurePath) → xphyle.types.PermissionSet¶ Get the permissions of a file/directory.
Parameters: path – Path of file/directory. Returns: An PermissionSet. Raises: IOError if the file/directory doesn’t exist.
-
xphyle.paths.
get_root
(path: Optional[pathlib.PurePath] = None) → str¶ Get the root directory.
Parameters: path – A path, or ‘.’ to get the root of the working directory, or None to get the root of the path to the script. Stdout and stderr are not valid arguments. Returns: A string path to the root directory.
-
xphyle.paths.
match_to_dict
(match: Match[~AnyStr], path_vars: Dict[str, xphyle.paths.PathVar], errors: bool = True) → Optional[Dict[str, Any]]¶ Convert a regular expression Match to a dict of (name, value) for all PathVars.
Parameters: - match – A re.Match.
- path_vars – A dict of PathVars.
- errors – If True, raise an exception on validation error, otherwise return None.
Returns: A (name, value) dict.
Raises: ValueError if any values fail validation.
-
xphyle.paths.
path_inst
(path: Union[str, pathlib.PurePath], values: dict = None) → xphyle.paths.PathInst¶ Create a PathInst from a path and values dict.
Parameters: - path – The path.
- values – The values dict.
Returns: A PathInst.
-
xphyle.paths.
resolve_path
(path: pathlib.PurePath, parent: pathlib.PurePath = None) → pathlib.PurePath¶ Resolves the absolute path of the specified file and ensures that the file/directory exists.
Parameters: - path – Path to resolve.
- parent – The directory containing path if path is relative.
Returns: The absolute path.
Raises: IOError
– if the path does not exist or is invalid.
-
xphyle.paths.
safe_check_path
(path: pathlib.PurePath, *args, **kwargs) → Optional[pathlib.PurePath]¶ Safe vesion of check_path. Returns None rather than throw an exception.
-
xphyle.paths.
safe_check_readable_file
(path: pathlib.PurePath) → Optional[pathlib.PurePath]¶ Safe vesion of check_readable_file. Returns None rather than throw an exception.
-
xphyle.paths.
safe_check_writable_file
(path: pathlib.PurePath) → Optional[pathlib.PurePath]¶ Safe vesion of check_writable_file. Returns None rather than throw an exception.
-
xphyle.paths.
set_permissions
(path: pathlib.PurePath, permissions: Union[xphyle.types.PermissionSet, Sequence[Union[str, int, xphyle.types.Permission, xphyle.types.ModeAccess]]]) → xphyle.types.PermissionSet¶ Sets file stat flags (using chmod).
Parameters: - path – The file to chmod.
- permissions – Stat flags (any of ‘r’, ‘w’, ‘x’, or an
PermissionSet
).
Returns: An
PermissionSet
.
-
xphyle.paths.
split_path
(path: pathlib.PurePath, keep_seps: bool = True, resolve: bool = True) → Tuple[str, ...]¶ Splits a path into a (parent_dir, name, *ext) tuple.
Parameters: - path – The path. Stdout and stderr are not valid arguments.
- keep_seps – Whether the extension separators should be kept as part of the file extensions
- resolve – Whether to resolve the path before splitting
Returns: A tuple of length >= 2, in which the first element is the parent directory, the second element is the file name, and the remaining elements are file extensions.
Examples
split_path(‘myfile.foo.txt’, False) # -> (‘/current/dir’, ‘myfile’, ‘foo’, ‘txt’) split_path(‘/usr/local/foobar.gz’, True) # -> (‘/usr/local’, ‘foobar’, ‘.gz’)
Plugin API¶
You shouldn’t need these modules unless you want to extend xphyle functionality.
xphyle.formats module¶
Interfaces to compression file formats. Magic numbers from: https://en.wikipedia.org/wiki/List_of_file_signatures
-
class
xphyle.formats.
BGzip
¶ Bases:
xphyle.formats.DualExeCompressionFormat
bgzip is block gzip. bgzip files are compatible with gzip. Typically, this format is only used when specifically requested, or when a bgzip file specifically has a .bgz (rather than .gz) extension.
The bgzip program is only used for compression; gzip is used for decompression because bgzip does not support decompressing a file with a non-.gz extension.
-
aliases
¶ All of the aliases by which this format is known.
-
allowed_exts
¶ Extensions that are allowed to be used. Defaults to self.exts.
-
compress_commands
¶
-
compresslevel_range
¶ The range of valid compression levels – (lowest, highest).
-
decompress_commands
¶
-
default_compresslevel
¶ The default compression level, if compression is supported and is user-configurable, otherwise None.
-
exts
¶ The commonly used file extensions.
-
get_compress_command
(src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: int = None) → List[str]¶ Build the compress command for the system executable.
Parameters: - src – The source file path, or STDIN if input should be read from stdin
- stdout – Whether output should go to stdout
- compresslevel – Integer compression level; typically 1-9
Returns: List of command arguments
-
get_decompress_command
(src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True) → List[str]¶ Build the decompress command for the system executable.
Parameters: - src – The source file path, or STDIN if input should be read from stdin
- stdout – Whether output should go to stdout
Returns: List of command arguments
-
magic_bytes
¶ The initial bytes that indicate the file type.
-
mime_types
¶ The MIME types.
-
module_name
¶
-
name
¶ The canonical format name.
-
open_file_python
(path_or_file: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode], **kwargs) → Union[IO, io.IOBase]¶ Open a file using the python library.
Parameters: - path_or_file – The file to open – a path or open file object.
- mode – The file open mode.
- kwargs – Additional arguments to pass to the open method.
Returns: A file-like object.
-
-
class
xphyle.formats.
BZip2
¶ Bases:
xphyle.formats.SingleExeCompressionFormat
Implementation of CompressionFormat for bzip2 files.
-
compresslevel_range
¶ The range of valid compression levels – (lowest, highest).
-
default_compresslevel
¶ The default compression level, if compression is supported and is user-configurable, otherwise None.
-
exts
¶ The commonly used file extensions.
-
get_command
(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: Optional[int] = 6) → List[str]¶ Build the command for the system executable.
Parameters: - operation – ‘c’ = compress, ‘d’ = decompress
- src – The source file path, or STDIN if input should be read from stdin
- stdout – Whether output should go to stdout
- compresslevel – Integer compression level; typically 1-9
Returns: List of command arguments
-
magic_bytes
¶ The initial bytes that indicate the file type.
-
mime_types
¶ The MIME types.
-
name
¶ The canonical format name.
-
open_file_python
(path_or_file: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode], **kwargs) → Union[IO, io.IOBase]¶ Open a file using the python library.
Parameters: - path_or_file – The file to open – a path or open file object.
- mode – The file open mode.
- kwargs – Additional arguments to pass to the open method.
Returns: A file-like object.
-
system_commands
¶ The names of the system-level commands, in order of preference.
-
-
class
xphyle.formats.
CompressionFormat
¶ Bases:
xphyle.formats.FileFormat
Base class for classes that provide access to system-level and python-level implementations of compression formats.
-
aliases
¶ All of the aliases by which this format is known.
-
allowed_exts
¶ Extensions that are allowed to be used. Defaults to self.exts.
-
can_use_system_compression
¶ Whether at least one command in
self.system_commands
resolves to an existing, executable file.
-
can_use_system_decompression
¶ Whether at least one command in
self.system_commands
resolves to an existing, executable file.
-
compress
(raw_bytes: bytes, **kwargs) → bytes¶ Compress bytes.
Parameters: - raw_bytes – The bytes to compress
- kwargs – Additional arguments to compression function.
Returns: The compressed bytes
-
compress_file
(source: Union[os.PathLike, IO, io.IOBase], dest: Union[os.PathLike, IO, io.IOBase] = None, keep: bool = True, compresslevel: int = None, use_system: bool = True, **kwargs) → pathlib.PurePath¶ Compress data from one file and write to another.
Parameters: - source – Source file, either a path or an open file-like object.
- dest – Destination file, either a path or an open file-like object.
If None, the file name is determined from
source
. - keep – Whether to keep the source file.
- compresslevel – Compression level.
- use_system – Whether to try to use system-level compression.
- kwargs – Additional arguments to pass to the open method when opening the destination file.
Returns: Path to the destination file.
Raises: IOError if there is an error compressing the file.
-
compress_iterable
(strings: Iterable[str], delimiter: bytes = b'', encoding: str = 'utf-8', **kwargs) → bytes¶ Compress an iterable of strings using the python-level interface.
Parameters: - strings – An iterable of strings
- delimiter – The delimiter (byte string) to use to separate strings
- encoding – The byte encoding (utf-8)
- kwargs – Additional arguments to compression function
Returns: The compressed text, as bytes
-
compress_name
¶ The name of the compression program.
-
compress_path
¶ The path of the compression program.
-
compress_string
(text: str, encoding: str = 'utf-8', **kwargs) → bytes¶ Compress a string.
Parameters: - text – The text to compress
- encoding – The byte encoding (utf-8)
- kwargs – Additional arguments to compression function
Returns: The compressed text, as bytes
-
compresslevel_range
¶ The range of valid compression levels – (lowest, highest).
-
decompress
(compressed_bytes, **kwargs) → bytes¶ Decompress bytes.
Parameters: - compressed_bytes – The compressed data
- kwargs – Additional arguments to the decompression function
Returns: The decompressed bytes
-
decompress_file
(source: Union[os.PathLike, IO, io.IOBase], dest: Union[os.PathLike, IO, io.IOBase, None] = None, keep: bool = True, use_system: bool = True, **kwargs) → pathlib.PurePath¶ Decompress data from one file and write to another.
Parameters: - source – Source file, either a path or an open file-like object.
- dest – Destination file, either a path or an open file-like object.
If None, the file name is determined from
source
. - keep – Whether to keep the source file.
- use_system – Whether to try to use system-level compression.
- kwargs – Additional arguments to passs to the open method when opening the compressed file.
Returns: Path to the destination file.
Raises: IOError if there is an error decompressing the file.
-
decompress_name
¶ The name of the decompression program.
-
decompress_path
¶ The path of the decompression program.
-
decompress_string
(compressed_bytes: bytes, encoding: str = 'utf-8', **kwargs) → str¶ Decompress bytes and return as a string.
Parameters: - compressed_bytes – The compressed data
- encoding – The byte encoding to use
- kwargs – Additional arguments to the decompression function
Returns: The decompressed data as a string
-
default_compresslevel
¶ The default compression level, if compression is supported and is user-configurable, otherwise None.
-
default_ext
¶ The default file extension for this format.
-
exts
¶ The commonly used file extensions.
-
get_command
(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: int = None) → List[str]¶ Build the command for the system executable.
Parameters: - operation – ‘c’ = compress, ‘d’ = decompress
- src – The source file path, or STDIN if input should be read from stdin
- stdout – Whether output should go to stdout
- compresslevel – Integer compression level; typically 1-9
Returns: List of command arguments
-
get_list_command
(path: pathlib.PurePath) → Optional[List[str]]¶ Get the command to list contents of a compressed file.
Parameters: path – Path to the compressed file. Returns: List of command arguments, or None if the uncompressed size cannot be determined (without actually decompressing the file).
-
handle_command_return
(returncode: int, cmd: List[str], stderr: bytes = None) → None¶ Handle the returned values from executing a system-level command.
Parameters: - returncode – The returncode from the command (typically, anything other than 0 is an error).
- cmd – The command that generated the return value.
- stderr – The standard error from the command.
Raises: IOError if the command output represents an error.
-
magic_bytes
¶ The initial bytes that indicate the file type.
-
mime_types
¶ The MIME types.
-
name
¶ The canonical format name.
-
open_file
(path: pathlib.PurePath, mode: Union[str, xphyle.types.FileMode], use_system: bool = True, **kwargs) → Union[IO, io.IOBase]¶ Opens a compressed file for reading or writing.
If
use_system
is True and the system provides an accessible executable, then system-level compression is used. Otherwise defaults to using the python implementation.Parameters: - path – The path of the file to open.
- mode – The file open mode.
- use_system – Whether to attempt to use system-level compression.
- kwargs – Additional arguments to pass to the python-level open method, if system-level compression isn’t used.
Returns: A file-like object.
-
open_file_python
(path_or_file: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode], **kwargs) → Union[IO, io.IOBase]¶ Open a file using the python library.
Parameters: - path_or_file – The file to open – a path or open file object.
- mode – The file open mode.
- kwargs – Additional arguments to pass to the open method.
Returns: A file-like object.
-
parse_file_listing
(listing: str) → Tuple[int, int, float]¶ Parse the result of the list command.
Parameters: listing – The output of executing the list command. Returns: A tuple (<compressed size in bytes>, <uncompressed size in bytes>, <compression ratio>).
-
system_commands
¶ The names of the system-level commands, in order of preference.
-
uncompressed_size
(path: pathlib.PurePath) → Optional[int]¶ Get the uncompressed size of a compressed file.
Parameters: path – Path to the compressed file. Returns: The uncompressed size of the file in bytes, or None if the uncompressed size cannot be determined (without actually decompressing the file).
-
-
class
xphyle.formats.
DualExeCompressionFormat
¶ Bases:
xphyle.formats.CompressionFormat
CompressionFormat that uses different executables for compressing and decompressing.
-
compress_commands
¶
-
compress_lib
¶ Caches and returns the python module for compressing this file format.
Returns: The module Raises: ImportError if the module cannot be imported.
-
compress_name
¶ The name of the compression program.
-
compress_path
¶ The path of the compression program.
-
decompress_commands
¶
-
decompress_lib
¶ Caches and returns the python module for decompressing this file format.
Returns: The module Raises: ImportError if the module cannot be imported.
-
decompress_name
¶ The name of the decompression program.
-
decompress_path
¶ The path of the decompression program.
-
get_command
(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: Optional[int] = None) → List[str]¶ Build the command for the system executable.
Parameters: - operation – ‘c’ = compress, ‘d’ = decompress
- src – The source file path, or STDIN if input should be read from stdin
- stdout – Whether output should go to stdout
- compresslevel – Integer compression level; typically 1-9
Returns: List of command arguments
-
get_compress_command
(src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: int = None) → List[str]¶ Build the compress command for the system executable.
Parameters: - src – The source file path, or STDIN if input should be read from stdin
- stdout – Whether output should go to stdout
- compresslevel – Integer compression level; typically 1-9
Returns: List of command arguments
-
get_decompress_command
(src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True) → List[str]¶ Build the decompress command for the system executable.
Parameters: - src – The source file path, or STDIN if input should be read from stdin
- stdout – Whether output should go to stdout
Returns: List of command arguments
-
system_commands
¶ The names of the system-level commands, in order of preference.
-
-
class
xphyle.formats.
FileFormat
¶ Bases:
abc.ABC
Base class for classes that wrap built-in python file format libraries. The subclass must provide the
name
member.-
lib
¶ Caches and returns the python module assocated with this file format.
Returns: The module Raises: ImportError if the module cannot be imported.
-
module_name
¶
-
name
¶
-
-
class
xphyle.formats.
Formats
¶ Bases:
object
Manages a set of compression formats.
-
compression_format_aliases
= None¶ Dict mapping aliases to compression format names.
-
compression_formats
= None¶ Dict of registered compression formats
-
get_compression_format
(name: str) → xphyle.formats.CompressionFormat¶ Returns the CompressionFormat associated with the given name.
Raises: ValueError if that format is not supported.
-
get_compression_format_name
(alias: str)¶ Returns the cannonical name for the given alias.
-
get_format_for_mime_type
(mime_type: str) → str¶ Returns the file format associated with a MIME type, or None if no format is associated with the mime type.
-
guess_compression_format
(name: Union[str, pathlib.PurePath]) → Optional[str]¶ Guess the compression format by name or file extension.
Returns: The format name, or None
if it could not be guessed.
-
guess_format_from_buffer
(buffer: _io.BufferedReader) → Optional[str]¶ Guess file format from a byte buffer that provides a
peek
method.Parameters: buffer – The buffer object Returns: The format name, or None
if it could not be guessed.
-
guess_format_from_file_header
(path: pathlib.PurePath) → Optional[str]¶ Guess file format from ‘magic bytes’ at the beginning of the file.
Note that
path
must be openable and readable. If it is a named pipe or other pseudo-file type, the magic bytes will be destructively consumed and thus will open correctly.Parameters: path – Path to the file Returns: The format name, or None
if it could not be guessed.
-
guess_format_from_header_bytes
(header_bytes: bytes) → Optional[str]¶ Guess file format from a sequence of bytes from a file header.
Parameters: header_bytes – The bytes Returns: The format name, or None
if it could not be guessed.
-
has_compatible_extension
(dest_fmt, ext_fmt) → bool¶ Checks that dest_fmt is allowed to use a file extension supported by ext_fmt. This is mostly to handle the special case where dest_fmt and ext_fmt allow the same extension and the actual format cannot be detected from the file header.
Returns: True if an allowed extension of dest_fmt is supported by ext_fmt else False.
-
list_compression_formats
()¶ Returns a list of all registered compression formats.
-
list_extensions
(with_sep: bool = False) → Iterable[str]¶ Returns an iterable with all valid extensions.
Parameters: with_sep – Add separator prefix to each extension.
-
magic_bytes
= None¶ Dict mapping the first byte in a ‘magic’ sequence to a tuple of (format, rest_of_sequence)
-
max_magic_bytes
= None¶ Maximum number of bytes in a registered magic byte sequence
-
mime_types
= None¶ Dict mapping MIME types to file formats
-
register_compression_format
(format_class: Callable[xphyle.formats.CompressionFormat]) → None¶ Register a new compression format.
Parameters: format_class – a subclass of CompressionFormat
-
-
class
xphyle.formats.
Gzip
¶ Bases:
xphyle.formats.SingleExeCompressionFormat
Implementation of CompressionFormat for gzip files.
-
compresslevel_range
¶ The compression level; pigz allows 0-11 (har har) while gzip allows 0-9.
-
default_compresslevel
¶ The default compression level, if compression is supported and is user-configurable, otherwise None.
-
exts
¶ The commonly used file extensions.
-
get_command
(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: int = None) → List[str]¶ Build the command for the system executable.
Parameters: - operation – ‘c’ = compress, ‘d’ = decompress
- src – The source file path, or STDIN if input should be read from stdin
- stdout – Whether output should go to stdout
- compresslevel – Integer compression level; typically 1-9
Returns: List of command arguments
-
get_list_command
(path: pathlib.PurePath) → List[str]¶ Get the command to list contents of a compressed file.
Parameters: path – Path to the compressed file. Returns: List of command arguments, or None if the uncompressed size cannot be determined (without actually decompressing the file).
-
handle_command_return
(returncode: int, cmd: List[str], stderr: bytes = None) → None¶ Handle the returned values from executing a system-level command.
Parameters: - returncode – The returncode from the command (typically, anything other than 0 is an error).
- cmd – The command that generated the return value.
- stderr – The standard error from the command.
Raises: IOError if the command output represents an error.
-
magic_bytes
¶ The initial bytes that indicate the file type.
-
mime_types
¶ The MIME types.
-
name
¶ The canonical format name.
-
open_file_python
(path_or_file: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode], **kwargs) → Union[IO, io.IOBase]¶ Open a file using the python library.
Parameters: - path_or_file – The file to open – a path or open file object.
- mode – The file open mode.
- kwargs – Additional arguments to pass to the open method.
Returns: A file-like object.
-
parse_file_listing
(listing: str) → Tuple[int, int, float]¶ Parse the result of the list command.
Parameters: listing – The output of executing the list command. Returns: A tuple (<compressed size in bytes>, <uncompressed size in bytes>, <compression ratio>).
-
system_commands
¶ The names of the system-level commands, in order of preference.
-
-
class
xphyle.formats.
Lzma
¶ Bases:
xphyle.formats.SingleExeCompressionFormat
Implementation of CompressionFormat for lzma (.xz) files.
-
compress
(raw_bytes: bytes, **kwargs) → bytes¶ Compress bytes.
Parameters: - raw_bytes – The bytes to compress
- kwargs – Additional arguments to compression function.
Returns: The compressed bytes
-
compresslevel_range
¶ The range of valid compression levels – (lowest, highest).
-
default_compresslevel
¶ The default compression level, if compression is supported and is user-configurable, otherwise None.
-
exts
¶ The commonly used file extensions.
-
get_command
(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: Optional[int] = 6) → List[str]¶ Build the command for the system executable.
Parameters: - operation – ‘c’ = compress, ‘d’ = decompress
- src – The source file path, or STDIN if input should be read from stdin
- stdout – Whether output should go to stdout
- compresslevel – Integer compression level; typically 1-9
Returns: List of command arguments
-
get_list_command
(path: pathlib.PurePath) → List[str]¶ Get the command to list contents of a compressed file.
Parameters: path – Path to the compressed file. Returns: List of command arguments, or None if the uncompressed size cannot be determined (without actually decompressing the file).
-
magic_bytes
¶ The initial bytes that indicate the file type.
-
mime_types
¶ The MIME types.
-
name
¶ The canonical format name.
-
parse_file_listing
(listing: str) → Tuple[int, int, float]¶ Parse the result of the list command.
Parameters: listing – The output of executing the list command. Returns: A tuple (<compressed size in bytes>, <uncompressed size in bytes>, <compression ratio>).
-
system_commands
¶ The names of the system-level commands, in order of preference.
-
-
class
xphyle.formats.
SingleExeCompressionFormat
¶ Bases:
xphyle.formats.CompressionFormat
Base class form ``CompressionFormat``s that use the same executable for compressing and decompressing.
-
compress_name
¶ The name of the compression program.
-
compress_path
¶ The path of the compression program.
-
decompress_name
¶ The name of the decompression program.
-
decompress_path
¶ The path of the decompression program.
-
executable_name
¶ The name of the system executable.
-
executable_path
¶ The path of the system executable.
-
-
class
xphyle.formats.
SystemIO
(path: pathlib.PurePath)¶ Bases:
xphyle.types.FileLikeBase
Base class for SystemReader and SystemWriter.
Parameters: path – The file path. -
closed
¶
-
name
¶
-
-
class
xphyle.formats.
SystemReader
(executable_path: pathlib.PurePath, path: pathlib.PurePath, command: List[str], executable_name: str = None)¶ Bases:
xphyle.formats.SystemIO
Read from a compressed file using a system-level compression program.
Parameters: - executable_path – The fully resolved path the the system executable
- path – The compressed file to read
- command – List of command arguments.
- executable_name – The display name of the executable, or
None
to use the basename ofexecutable_path
-
close
() → None¶ Close the reader; terminates the underlying process.
-
flush
() → None¶ Implementing file interface; no-op.
-
mode
¶
-
read
(*args) → bytes¶ Read bytes from the stream. Arguments are passed through to the subprocess
read
method.
-
readable
() → bool¶ Implementing file interface; returns True.
-
class
xphyle.formats.
SystemWriter
(executable_path: pathlib.PurePath, path: pathlib.PurePath, mode: Union[str, xphyle.types.FileMode] = 'w', command: List[str] = None, executable_name: str = None)¶ Bases:
xphyle.formats.SystemIO
Write to a compressed file using a system-level compression program.
Parameters: - executable_path – The fully resolved path the the system executable.
- path – The compressed file to read.
- mode – The write mode (w/a/x).
- command – Format string with two variables –
exe
(the path to the system executable), andpath
. - executable_name – The display name of the executable, or
None
to use the basename ofexecutable_path
.
-
close
() → None¶ Close the writer; terminates the underlying process.
-
flush
() → None¶ Flush stdin of the underlying process.
-
mode
¶
-
writable
() → bool¶ Implementing file interface; returns True.
-
write
(arg) → int¶ Write to stdin of the underlying process.
-
xphyle.formats.
THREADS
= <xphyle.formats.ThreadsVar object>¶ Number of concurrent threads that can be used by formats that support parallelization.
-
class
xphyle.formats.
ThreadsVar
(default_value: int = 1)¶ Bases:
object
Maintain
threads
variable.-
update
(threads: Optional[int] = True) → None¶ Update the number of threads to use.
Parameters: threads – True = use all available cores; False or an int <= 1 means single-threaded; None means reset to the default value; otherwise an integer number of threads.
-
-
class
xphyle.formats.
Zstd
¶ Bases:
xphyle.formats.SingleExeCompressionFormat
Implementation of CompressionFormat for zstd (.zst) files.
-
compresslevel_range
¶ The range of valid compression levels – (lowest, highest).
-
default_compresslevel
¶ The default compression level, if compression is supported and is user-configurable, otherwise None.
-
exts
¶ The commonly used file extensions.
-
get_command
(operation: str, src: pathlib.PurePath = PurePosixPath('/dev/stdin'), stdout: bool = True, compresslevel: int = None) → List[str]¶ Build the command for the system executable.
Parameters: - operation – ‘c’ = compress, ‘d’ = decompress
- src – The source file path, or STDIN if input should be read from stdin
- stdout – Whether output should go to stdout
- compresslevel – Integer compression level; typically 1-9
Returns: List of command arguments
-
get_list_command
(path: pathlib.PurePath) → List[str]¶ Get the command to list contents of a compressed file.
Parameters: path – Path to the compressed file. Returns: List of command arguments, or None if the uncompressed size cannot be determined (without actually decompressing the file).
-
magic_bytes
¶ The initial bytes that indicate the file type.
-
mime_types
¶ The MIME types.
-
module_name
¶
-
name
¶ The canonical format name.
-
open_file_python
(path_or_file: Union[os.PathLike, IO, io.IOBase], mode: Union[str, xphyle.types.FileMode], **kwargs) → Union[IO, io.IOBase]¶ Open a file using the python library.
Parameters: - path_or_file – The file to open – a path or open file object.
- mode – The file open mode.
- kwargs – Additional arguments to pass to the open method.
Returns: A file-like object.
-
parse_file_listing
(listing: str) → Tuple[int, int, float]¶ Parse the result of the list command.
Parameters: listing – The output of executing the list command. Returns: A tuple (<compressed size in bytes>, <uncompressed size in bytes>, <compression ratio>).
-
-
xphyle.formats.
compression_format
(cls)¶ Required decorator on concrete CompressionFormat subclasses. Registers the CompressionFormat in FORMATS.
xphyle.progress module¶
Common interface to enable operations to be wrapped in a progress bar. By default, pokrok is used for python-level operations and pv for system-level operations.
-
class
xphyle.progress.
IterableProgress
(default_wrapper: Callable = <function progress_iter>)¶ Bases:
object
Manages the python-level wrapper.
Parameters: default_wrapper – Callable (typically a class) that returns a Callable with the signature of wrap
.-
update
(enable: Optional[bool] = None, wrapper: Optional[Callable[..., Iterable]] = None) → None¶ Enable the python progress bar and/or set a new wrapper.
Parameters: - enable – Whether to enable use of a progress wrapper.
- wrapper – A callable that takes three arguments, itr, desc, size, and returns an iterable.
-
wrap
(itr: Iterable, desc: Optional[str] = None, size: Optional[int] = None) → Iterable¶ Wrap an iterable in a progress bar.
Parameters: - itr – The Iterable to wrap.
- desc – Optional description.
- size – Optional max value of the progress bar.
Returns: The wrapped Iterable.
-
-
class
xphyle.progress.
ProcessProgress
(default_wrapper: Callable = <function pv_command>)¶ Bases:
object
Manage the system-level progress wrapper.
Parameters: default_wrapper – Callable that returns the argument list for the default wrapper command. -
update
(enable: Optional[bool] = None, wrapper: Union[str, Sequence[str], None] = None) → None¶ Enable the python system progress bar and/or set the wrapper command.
Parameters: - enable – Whether to enable use of a progress wrapper.
- wrapper – A command string or sequence of command arguments.
-
wrap
(cmd: Sequence[str], stdin: Union[IO, io.IOBase], stdout: Union[IO, io.IOBase], **kwargs) → subprocess.Popen¶ Pipe a system command through a progress bar program.
For the process to be wrapped, one of
stdin
,stdout
must not be None.Parameters: - cmd – Command arguments.
- stdin – File-like object to read into the process stdin, or None to use PIPE.
- stdout – File-like object to write from the process stdout, or None to use PIPE.
- kwargs – Additional arguments to pass to Popen.
Returns: Open process.
-
-
xphyle.progress.
iter_file_chunked
(fileobj: Union[IO, io.IOBase], chunksize: int = 1024) → Iterable¶ Returns a progress bar-wrapped iterator over a file that reads fixed-size chunks.
Parameters: - fileobj – A file-like object.
- chunksize – The maximum size in bytes of each chunk.
Returns: An iterable over the chunks of the file.
-
xphyle.progress.
pv_command
()¶ Default system wrapper command.
-
xphyle.progress.
system_progress_command
()¶ Resolve a system-level progress bar command.
Parameters: - exe – The executable name or absolute path.
- args – A list of additional command line arguments.
- require – Whether to raise an exception if the command does not exist.
Returns: A tuple of (executable_path, *args).
xphyle.urls module¶
Methods for handling URLs.
-
xphyle.urls.
get_url_file_name
(response: Any, parsed_url: Optional[urllib.parse.ParseResult] = None) → Optional[str]¶ If a response object has HTTP-like headers, extract the filename from the Content-Disposition header.
Parameters: - response – A response object returned by open_url.
- parsed_url – The result of calling parse_url.
Returns: The file name, or None if it could not be determined.
-
xphyle.urls.
get_url_mime_type
(response: Any) → Optional[str]¶ If a response object has HTTP-like headers, extract the MIME type from the Content-Type header.
Parameters: response – A response object returned by open_url. Returns: The content type, or None if the response lacks a ‘Content-Type’ header.
-
xphyle.urls.
open_url
(url_string: str, byte_range: Optional[Tuple[int, int]] = None, headers: Optional[dict] = None, **kwargs) → Any¶ Open a URL for reading.
Parameters: - url_string – A valid url string.
- byte_range – Range of bytes to read (start, stop).
- headers – dict of request headers.
- kwargs – Additional arguments to pass to urlopen.
Returns: A response object, or None if the URL is not valid or cannot be opened.
Notes
The return value of urlopen is only guaranteed to have certain methods, not to be of any specific type, thus the Any return type. Furthermore, the response may be wrapped in an io.BufferedReader to ensure that a peek method is available.
-
xphyle.urls.
parse_url
(url_string: str) → Optional[urllib.parse.ParseResult]¶ Attempts to parse a URL.
Parameters: url_string – String to test. Returns: A 6-tuple, as described in urlparse
, or None if the URL cannot be parsed, or if it lacks a minimum set of attributes. Note that a URL may be valid and still not be openable (for example, if the scheme is recognized by urlopen).