Adding a Custom Filesystem Plugin

Adding a Custom Filesystem Plugin

Background

The TensorFlow framework is often used in multi-process and multi-machine environments, such as Google data centers, Google Cloud Machine Learning, Amazon Web Services (AWS), and on-site distributed clusters. In order to both share and save certain types of state produced by TensorFlow, the framework assumes the existence of a reliable, shared filesystem. This shared filesystem has numerous uses, for example:

  • Checkpoints of state are often saved to a distributed filesystem for reliability and fault-tolerance.
  • Training processes communicate with TensorBoard by writing event files to a directory, which TensorBoard watches. A shared filesystem allows this communication to work even when TensorBoard runs in a different process or machine.

There are many different implementations of shared or distributed filesystems in the re