Will selective unlearning become a new branch of deep learning? Just wondering!
When is unlearning equivalent or analogous to censorship or whitewashing?
Caveat: I did not read the paper.
From the abstract:
"Large language models (LLMs) can learn to produce sensitive outputs which model deployers may wish to reduce, motivating the use of output suppression (LLM unlearning) methods.
We demonstrate that taking only a few uphill Gauss-Newton steps on a forget set provides a conceptually simple, state-of-the-art unlearning algorithm that is underexplored in the LLM literature.
We show that these steps can be efficiently and accurately implemented for LLMs using parametric Hessian approximations such as K-FAC. We call this approach K-FAC for Distribution Erasure (K-FADE).
Our evaluations demonstrate that K-FADE performs competitively with or better than previous unlearning approaches for LLMs across standard benchmarks. Specifically, K-FADE approximates the output distribution of models re-finetuned with certain data excluded on the ToFU unlearning benchmark. K-FADE also effectively suppresses outputs from a specific distribution while minimally altering the model's outputs on non-targeted data from the WMDP benchmark."
Here is another, recent unlearning paper: OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics
No comments:
Post a Comment