# Welcome to libarchive! The libarchive project develops a portable, efficient C library that can read and write streaming archives in a variety of formats. It also includes implementations of the common `tar`, `cpio`, and `zcat` command-line tools that use the libarchive library. ## Questions? Issues? * http://www.libarchive.org is the home for ongoing libarchive development, including documentation, and links to the libarchive mailing lists. * To report an issue, use the issue tracker at https://github.com/libarchive/libarchive/issues * To submit an enhancement to libarchive, please submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls ## Contents of the Distribution This distribution bundle includes the following major components: * **libarchive**: a library for reading and writing streaming archives * **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive * **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality * **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such * **examples**: Some small example programs that you may find useful. * **examples/minitar**: a compact sample demonstrating use of libarchive. * **contrib**: Various items sent to me by third parties; please contact the authors with any questions. The top-level directory contains the following information files: * **NEWS** - highlights of recent changes * **COPYING** - what you can do with this * **INSTALL** - installation instructions * **README** - this file * **CMakeLists.txt** - input for "cmake" build tool, see INSTALL * **configure** - configuration script, see INSTALL for details. If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`). The following files in the top-level directory are used by the 'configure' script: * `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers * `Makefile.in`, `config.h.in` - templates used by configure script ## Documentation In addition to the informational articles and documentation in the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki), the distribution also includes a number of manual pages: * bsdtar.1 explains the use of the bsdtar program * bsdcpio.1 explains the use of the bsdcpio program * bsdcat.1 explains the use of the bsdcat program * libarchive.3 gives an overview of the library as a whole * archive_read.3, archive_write.3, archive_write_disk.3, and archive_read_disk.3 provide detailed calling sequences for the read and write APIs * archive_entry.3 details the "struct archive_entry" utility class * archive_internals.3 provides some insight into libarchive's internal structure and operation. * libarchive-formats.5 documents the file formats supported by the library * cpio.5, mtree.5, and tar.5 provide detailed information about these popular archive formats, including hard-to-find details about modern cpio and tar variants. The manual pages above are provided in the 'doc' directory in a number of different formats. You should also read the copious comments in `archive.h` and the source code for the sample programs for more details. Please let us know about any errors or omissions you find. ## Supported Formats Currently, the library automatically detects and reads the following fomats: * Old V7 tar archives * POSIX ustar * GNU tar format (including GNU long filenames, long link names, and sparse files) * Solaris 9 extended tar format (including ACLs) * POSIX pax interchange format * POSIX octet-oriented cpio * SVR4 ASCII cpio * POSIX octet-oriented cpio * Binary cpio (big-endian or little-endian) * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions) * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives) * GNU and BSD 'ar' archives * 'mtree' format * 7-Zip archives * Microsoft CAB format * LHA and LZH archives * RAR archives (with some limitations due to RAR's proprietary status) * XAR archives The library also detects and handles any of the following before evaluating the archive: * uuencoded files * files with RPM wrapper * gzip compression * bzip2 compression * compress/LZW compression * lzma, lzip, and xz compression * lz4 compression * lzop compression The library can create archives in any of the following formats: * POSIX ustar * POSIX pax interchange format * "restricted" pax format, which will create ustar archives except for entries that require pax extensions (for long filenames, ACLs, etc). * Old GNU tar format * Old V7 tar format * POSIX octet-oriented cpio * SVR4 "newc" cpio * shar archives * ZIP archives (with uncompressed or "deflate" compressed entries) * GNU and BSD 'ar' archives * 'mtree' format * ISO9660 format * 7-Zip archives * XAR archives When creating archives, the result can be filtered with any of the following: * uuencode * gzip compression * bzip2 compression * compress/LZW compression * lzma, lzip, and xz compression * lz4 compression * lzop compression ## Notes about the Library Design The following notes address many of the most common questions we are asked about libarchive: * This is a heavily stream-oriented system. That means that it is optimized to read or write the archive in a single pass from beginning to end. For example, this allows libarchive to process archives too large to store on disk by processing them on-the-fly as they are read from or written to a network or tape drive. This also makes libarchive useful for tools that need to produce archives on-the-fly (such as webservers that provide archived contents of a users account). * In-place modification and random access to the contents of an archive are not directly supported. For some formats, this is not an issue: For example, tar.gz archives are not designed for random access. In some other cases, libarchive can re-open an archive and scan it from the beginning quickly enough to provide the needed abilities even without true random access. Of course, some applications do require true random access; those applications should consider alternatives to libarchive. * The library is designed to be extended with new compression and archive formats. The only requirement is that the format be readable or writable as a stream and that each archive entry be independent. There are articles on the libarchive Wiki explaining how to extend libarchive. * On read, compression and format are always detected automatically. * The same API is used for all formats; in particular, it's very easy for software using libarchive to transparently handle any of libarchive's archiving formats. * Libarchive's automatic support for decompression can be used without archiving by explicitly selecting the "raw" and "empty" formats. * I've attempted to minimize static link pollution. If you don't explicitly invoke a particular feature (such as support for a particular compression or format), it won't get pulled in to statically-linked programs. In particular, if you don't explicitly enable a particular compression or decompression support, you won't need to link against the corresponding compression or decompression libraries. This also reduces the size of statically-linked binaries in environments where that matters. * The library is generally _thread safe_ depending on the platform: it does not define any global variables of its own. However, some platforms do not provide fully thread-safe versions of key C library functions. On those platforms, libarchive will use the non-thread-safe functions. Patches to improve this are of great interest to us. * In particular, libarchive's modules to read or write a directory tree do use `chdir()` to optimize the directory traversals. This can cause problems for programs that expect to do disk access from multiple threads. * The library is _not_ thread aware, however. It does no locking or thread management of any kind. If you create a libarchive object and need to access it from multiple threads, you will need to provide your own locking. * On read, the library accepts whatever blocks you hand it. Your read callback is free to pass the library a byte at a time or mmap the entire archive and give it to the library at once. On write, the library always produces correctly-blocked output. * The object-style approach allows you to have multiple archive streams open at once. bsdtar uses this in its "@archive" extension. * The archive itself is read/written using callback functions. You can read an archive directly from an in-memory buffer or write it to a socket, if you wish. There are some utility functions to provide easy-to-use "open file," etc, capabilities. * The read/write APIs are designed to allow individual entries to be read or written to any data source: You can create a block of data in memory and add it to a tar archive without first writing a temporary file. You can also read an entry from an archive and write the data directly to a socket. If you want to read/write entries to disk, there are convenience functions to make this especially easy. * Note: "pax interchange format" is really an extended tar format, despite what the name says.