稀疏优化 理论和应用部分 的课程提纲

1. 第一节课提纲 [pdf]，作业 [png]
2. 第二节课提纲 [pdf]
3. 第三节课提纲 [pdf]，补充材料 [pdf] [ppt]，阅读一篇论文 [pdf]
4. 第四节课提纲 [pdf]，阅读Candes-Tao基于RIP的Exact Recovery证明
5. 第五节课提纲 [pdf]，补充材料 [pdf]
6. 第六节课提纲 [pdf]，阅读Candes-Romberg-Tao的Stable Signal Recovery文章
7. 优化与应用会议报告: Sparse Signal Reconstruction by Iterative Support Detection [pdf] [pdf]

稀疏优化 算法部分 课程提纲与参考文献

1. (非单调线搜索 ) Hongchao Zhang and William  Hager, A nonmonotone line search technique and its application to unconstrained optimization
2. (保证收敛性的BB 算法) Marcos Raydan, The Barzilar and Borwein gradient method for the large scale unconstrained minimization problem
3.  Ernesto G. Birgin, Jose Martinez,  Marcos Raydan, Nonmonotone projected gradient methods on convex sets
4. Mario A. T. Figueiredo, Robert D. Nowak, Stephen J. Wright, Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems
5. Ewout Van Den Berg, Michael Friedlander, Probing the pareto frontier for basis pursuit solutions

1. Elaine Hale, Wotao Yin, Yin Zhang, Fixed-point continuation for l1-minimzation: methodology and convergence
2. Zaiwen Wen, Wotao Yin, Donald Goldfarb, Yin Zhang,  A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization and continuation

1. Wotao Yin, Stanley Osher, Donald Goldfarb,  Jerome Darbon, Bregman Iterative Algorithms for l1-Minimization with Applications to Compressed Sensing
2. Junfeng Yang, Yin Zhang, Alternating direction algorithms for l1-problems in Compressed Sensing
3. Tom Goldstein, Stanely Osher, The Split Bregman Method for L1-Regularized Problems
4. B.S. He, H. Yang, S.L. Wang, Alternating Direction Method with Self-Adaptive Penalty Parameters for Monotone Variational Inequalities

1. Xiaoqun Zhang, Martin Burgerz, Stanley Osher,  A unified primal-dual algorithm framework based on Bregman iteration
2. Yurii Nesterov, Introductory lectures on convex optimization, a basic course

1. Amir Beck and Marc Teboulle,  A fast iterative shrinkage-thresholding algorithm for linear inverse problems
2. Paul Tseng, On accelerated proximal gradient methods for convex-concave optimization
3. Paul Tseng,  Approximation accuracy, gradient methods and error bound for structured convex optimization
4. N. S. Aybat and G. Iyengar,  A first-order augmented Lagrangian method for compressed sensing

1. Jianfeng Cai, Emmanuel Candes, Zuowei Shen,  Singular value thresholding algorithm for matrix completion
2. Shiqian Ma, Donald Goldfarb, Lifeng Chen, Fixed point and Bregman iterative methods for matrix rank minimization
3. Zaiwen Wen, Wotao Yin, Yin Zhang,  Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm
4. Onureena Banerjee, Laurent El Ghaoui, Alexandre d’Aspremont, Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data
5. Zhaosong Lu, Smooth optimization approach for sparse covariance selection

(相关文献还有很多，我们这里只列出课程里重点讨论的文献）

Convolution / Circulant Sampling in 1D and 2D

Toeplitz/Circulant/卷积形式的随机线性投影比较容易实现。目前，使用它们的具体工作在通信、图像、核磁共振等方面已经出现了。Toeplitz和Circulant矩阵分别是 它们与同样尺寸的高斯随机矩阵相比拥有少得多的自由度，因此也可以说它们更加得不随机。线性投影的随机性在压缩感知很重要，它极大地保证线性投影(encoding linear projection)和信号的稀疏基(sparsifying basis)之间的不一致(incoherent)。这种不一致使得信号可以从很少的采样中被还原。既然Toeplitz和Circulant矩阵比高斯矩阵更加的不随机，那么使用Toeplitz和Circulant矩阵会不会需要更多的采样呢？答案是不会。并且，如果使用了它们，信号还原的算法速度会更快。Toeplitz和Circulant矩阵的产生方法、采样点、最低的采样率、以及快速还原方法在这里有介绍（我的共同作者是Simon Morgan、杨俊峰、张寅）。我们欢迎兴趣的同行索要代码。

This part describes how to implement circulant sampling for both 1D signals and 2D images. It serves an orientation page for algorithms described in this work. First, let us practice generating a standard 1D circulant matrix in Matlab.

Given an n-dimensional 1D vector x and circulant matrix C, one obtains the circulant samples as a matrix-vector product Cx. A circulant matrix is determined by its first row. Other rows are circulant shifts of the first row. In Matlab, one can call C = gallery(‘circul’,v) to generate C whose first row is v. The circulant matrix has a spectral representation: $C=(1/n) F^* \mbox{diag}(F^*v) F$, or alternatively $C=F\mbox{diag}(Fv)F^{-1}$, where F and $F^*$ are the Fourier transform and its adjoint, respectively, and v is the first row of C. Therefore, given a vector v, the following Matlab code will yield the same C, C1, and C2:

v = randn(1,10);
n = length(v);
F = dftmtx(n);
C = gallery(‘circul’,v);
C1 = F*diag(F*v(:))/F;
C2 = F’*diag(F’*v(:))*F/n;

To verify, one can run norm(C(:)-C1(:)) and norm(C(:)-C2(:)) and check how small the results are. Often we do not need C but its matrix-vector product Cx. Thanks to the structure of C, having v or its Fourier transform Fv is enough for computing this matrix-vector product. Given x, the following code shall give the same prod1, prod2, prod3, and prod4:

prod1 = C*x;
prod2 = fft(fft(v(:)).*ifft(x));
prod3 = ifft(ifft(v(:)).*fft(x))*n;
temp = conv(v(end:-1:1),x);
prod4 = temp(end-n+1:end);

The formulas for computing prod2 and prod3 are based on the spectral decomposition of C, and they take advantage the fast Fourier transform. prod4 is computed as the convolution between v (in its reversed order) and x. This should not be surprising since Cx consists of the inner products between x and v at all shifted locations. When Cx needs to be computed many times with different x, for better speed I recommend you use the second or third method above and pre-compute fft(v(:)) or ifft(v(:)), respectively.

To move from 1D to 2D, we can extend the 1D spectral representation $C=(1/n) F^* \mbox{diag}(F^*v) F$ by letting F be a 2D Fourier transform and v be a 2D array. Note that C is no longer a matrix but a linear operator on a 2D array, and Cx yield a 2D array consisting of the inner products between x and the 2D array v at its all shifted locations. The shifts are two-way: left-right and up-down. In analogy with the 1D case, Cx becomes a certain 2D convolution for x.

The following code demonstrates five different ways of computing Cx in Matlab, where x and v are randomly generated.

% —— Sec.1 ——-
nRows = ceil(100*rand);
nCols = ceil(100*rand);
x = randn(nRows,nCols);
v = randn(nRows,nCols);
% —— Sec.2 ——-
prod1 = zeros(nRows,nCols);
dot2 = @(x,y) sum(x(:).*y(:));
for jj = 1:nCols
for ii = 1:nRows
prod1(ii,jj) = dot2(v,x);
v = circshift(v,[1 0]);
end
v = circshift(v,[0 1]);
end
% —— Sec.3 ——-
prod2 = fft2(fft2(v).*ifft2(x));
prod3 = ifft2(ifft2(v).*fft2(x))*nRows*nCols;
% —— Sec.4 ——-
psf = flipud(fliplr(circshift(v, [ceil(nRows/2)-1 ceil(nCols/2)-1])));
% —— Sec.5 ——-
prod4 = ifft2(psf2otf(psf).*fft2(x));
prod5 = imfilter(x, psf, ‘conv’, ‘circular’);

Sec. 1 generates x and v of random sizes and endow their entries with random values. They are real valued but they can take complex values too. Cx are computed and stored in prod1, prod2, prod3, prod4, and prod5, all of which have the same size as x and v. They correspond to the moving-mask realization (prod1), FFT realizations (prod2 and prod3), optical realization (prod4), and circular convolution realization (prod5), respectively.

Sec. 2 illustrates how to compute Cx by taking dot-products (i.e., inner products) between x and v at all of its shifted locations. Each inner loop (counted by ii) shifts v downward one step, and each outer loop also shifts v to the right one step. One imagines that x is a static scene and v is a moving mask. Unlike the 1D case, v moves in not just one but both directions. At the end of the nested loops, v returns to its original position as if it is never shifted. Based on the spectral representation of C, Sec. 3 shows how to use fft2 and ifft2 to quickly compute Cx. Sec. 5 provides another two short and fast ways. To use them, however, v must first be converted to a point-spread function (psf). This conversion in Sec. 4 involves shifting v horizontally and vertically about half of its total height and width, respectively, followed by a left-right flip and an upside-down flip. This aligns psf up with Matlab’s psf2otf function and its 2D convolution. The conversion is invertible as v and psf satisfy

v = circshift(fliplr(flipup(psf)), [1-ceil(nRows/2) 1-ceil(nCols/2)]);

Comparing the lines for prod3 and prod4, we also establish ifft2(v)*nRows*nCol = psf2otf(psf). There is nothing magic here really: this explain how Matlab implements psf2otf; it first converts the input psf to v and then applies ifft2(v)*nRows*nCol. If one calls otf = psf2otf(psf,outsize) with outsize larger than the size of psf, then psf must be first padded to that larger size.

Our code RecPC for 2D compressive imaging requires computing Cx, and it uses the form of prod4. The above description hopefully explains how to translate the computation to the physics of random circulant / convolution / Toeplitze compressive sampling and vice versa.

Posted in Compressive Sensing, 稀疏优化 | 5 Comments