Photo credit: www.csoonline.com
To ban or not to ban, that is the pickle
Hugging Face has become a significant platform in the machine learning (ML) community, particularly for its support of various model formats. Among these, Pickle sees considerable usage primarily due to PyTorch, a popular ML library in Python that heavily relies on Pickle for the serialization and deserialization of models. Essentially, Pickle serves as a core module in Python for object serialization, transforming an object into a byte stream, while the reverse process is known as deserialization, expressed in Python terminology as pickling and unpickling.
The challenges associated with serialization and deserialization processes are particularly pronounced when it comes to inputs from untrusted sources. This practice has led to numerous remote code execution vulnerabilities across various programming languages. The Python documentation for Pickle carries a prominent warning emphasizing the risks: “It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.”
This situation presents a dilemma for platforms like Hugging Face, which encourages users to share models openly and often needs to handle unpickled model data. On one side of the coin, there is a risk of abuse, where malicious actors might upload compromised models designed to execute harmful code. Conversely, implementing a ban on the Pickle format could prove overly restrictive, especially considering the strong adoption of PyTorch within the community. As a solution, Hugging Face has opted for a balanced approach, focusing on efforts to scan and identify potential threats in Pickle files before they are accessed by users.
Source
www.csoonline.com