Man page - gejsv(3)
Packages contains this manual
- hptrd(3)
- potri(3)
- xerbla_array(3)
- ggsvd_driver_grp(3)
- hfrk(3)
- getsqr_comp_grp(3)
- laed6(3)
- gtrfs(3)
- lasdq(3)
- gglse(3)
- la_xisnan_la_isnan(3)
- unmr2(3)
- hetrs_aa(3)
- tpttr(3)
- gerz_comp_grp(3)
- potrf(3)
- hegv_driver(3)
- laqps(3)
- ggqr_comp_grp(3)
- ilalc(3)
- ung2r(3)
- heevd(3)
- pstf2(3)
- lacn2(3)
- ptrfs(3)
- ungrq(3)
- gelqf(3)
- ppsv_comp(3)
- blas2_full(3)
- gemlqt(3)
- unml2(3)
- tplqt(3)
- tpcon(3)
- getf2(3)
- ggbak(3)
- bdsvd_driver(3)
- lamch(3)
- gelq(3)
- gebal(3)
- laqr1(3)
- ptsvx(3)
- lahr2(3)
- larscl2(3)
- geqrt(3)
- larfb(3)
- gtsv_comp(3)
- gesvd_aux(3)
- hbevx_2stage(3)
- hbgvx(3)
- tprfs(3)
- params_grp(3)
- lahef(3)
- laqr_group(3)
- unmqr(3)
- tgsy2(3)
- tfsv_comp(3)
- ggls_driver_grp(3)
- geev(3)
- latrd(3)
- unbdb4(3)
- bbcsd(3)
- lange(3)
- gelq_comp3(3)
- gttrs(3)
- lasy2(3)
- hetf2_rook(3)
- gtsv(3)
- lalsd(3)
- lanhb(3)
- laqhb(3)
- hgeqz(3)
- gesvj(3)
- gsvj0(3)
- ungtsqr_row(3)
- gelq_comp1(3)
- gemmtr(3)
- pbequ(3)
- heev_driver(3)
- unhr_col(3)
- syconvf_rook(3)
- getc2(3)
- syconv(3)
- norm_grp(3)
- larrc(3)
- laqr4(3)
- posv_comp(3)
- geev_driver_grp(3)
- heev_comp(3)
- pfsv(3)
- trevc3(3)
- gesv_driver_grp(3)
- reflector_aux_grp(3)
- langt(3)
- lacrt(3)
- latdf(3)
- hetrs_aa_2stage(3)
- lamc1(3)
- hpev_driver(3)
- hegvd(3)
- pptri(3)
- geqrt3(3)
- gelqt3(3)
- lasd5(3)
- laeda(3)
- geqr(3)
- lamtsqr(3)
- heev(3)
- hpev_comp(3)
- larfg(3)
- blas2_grp(3)
- hesv_rook(3)
- laexc(3)
- hetrd(3)
- geesx(3)
- ppsvx(3)
- blas_top(3)
- gtts2(3)
- la_herpvgrw(3)
- hpevx(3)
- ggevx(3)
- lahqr(3)
- gelq_comp_grp(3)
- hesv_comp_v3(3)
- tplqt2(3)
- hpev(3)
- hbtrd(3)
- getrs(3)
- hecon_3(3)
- lasrt(3)
- lanhe(3)
- gesv_comp(3)
- gbequ(3)
- hetrf_rk(3)
- laqr3(3)
- heev_comp_grp(3)
- ungtsqr(3)
- ppcon(3)
- ggrq_comp_grp(3)
- larmm(3)
- ieeeck(3)
- geqrf(3)
- solve_aux_grp(3)
- herfs(3)
- posvx(3)
- posvxx(3)
- gges3(3)
- hbgvd(3)
- lantb(3)
- lasd_comp_grp(3)
- hpgvx(3)
- lapy2(3)
- lauu2(3)
- copy(3)
- getsqrhrt(3)
- stev_comp_grp(3)
- laev2(3)
- larfb_gett(3)
- trti2(3)
- laqz4(3)
- hegv_driver_grp(3)
- la_porfsx_extended(3)
- laruv(3)
- ggsvd_comp_grp(3)
- dot(3)
- gehd2(3)
- lanhf(3)
- hetri_rook(3)
- pfsv_comp(3)
- gbtrf(3)
- hpgst(3)
- getri(3)
- trevc(3)
- unmrz(3)
- hsein(3)
- lsamen(3)
- lasd6(3)
- trtri(3)
- ggglm(3)
- las2(3)
- latrs(3)
- lapll(3)
- gemlq(3)
- geqpf_comp_grp(3)
- stemr(3)
- rotm(3)
- disna(3)
- ggrqf(3)
- pptrf(3)
- lasd0(3)
- lals0(3)
- laqz2(3)
- hbev_driver2(3)
- geswlq_comp_grp(3)
- laqr0(3)
- trttp(3)
- stedc(3)
- lasq4(3)
- geev_comp_grp(3)
- ungbr(3)
- lanv2(3)
- hpsv(3)
- pprfs(3)
- gehrd(3)
- ppsv(3)
- lagtm(3)
- hpgv(3)
- trsv_comp(3)
- larfx(3)
- gesv_driver(3)
- gerfsx(3)
- la_geamv(3)
- laed9(3)
- tpqrt2(3)
- uncsd(3)
- gecs_comp_grp(3)
- bdsqr(3)
- hegv_comp_grp(3)
- labad(3)
- geqp3(3)
- gesvdq(3)
- tfttp(3)
- laln2(3)
- uncsd2by1(3)
- blas2_like_grp(3)
- latbs(3)
- hbgst(3)
- larrv(3)
- ilaenv2stage(3)
- bdsvdx(3)
- hegs2(3)
- lasq_comp_grp(3)
- hpr2(3)
- laqhe(3)
- larra(3)
- gemqrt(3)
- hbmv(3)
- hpsv_driver(3)
- lacp2(3)
- lapmt(3)
- gecon(3)
- unbdb5(3)
- la_gerpvgrw(3)
- tgex2(3)
- laqhp(3)
- tftri(3)
- getrf2(3)
- porfs(3)
- lartg(3)
- lagts(3)
- ggev_comp_grp(3)
- lasd3(3)
- geqr_comp2(3)
- laqz_group(3)
- pftri(3)
- hetri2x(3)
- lahef_aa(3)
- svd_driver_grp(3)
- gbsv_driver(3)
- hesv_comp_aasen2(3)
- laqtr(3)
- lag2(3)
- la_porcond(3)
- hbev(3)
- pbtrf(3)
- lascl(3)
- larr_comp_grp(3)
- hecon(3)
- pttrs(3)
- lasd8(3)
- lsame(3)
- unm2l(3)
- potrs(3)
- tptrs(3)
- lartv(3)
- trtrs(3)
- gsvj1(3)
- sum1(3)
- larrj(3)
- gbmv(3)
- posv(3)
- gghd3(3)
- geev_top(3)
- geqr_comp_grp(3)
- laset(3)
- hesvxx(3)
- posv_comp_grp(3)
- lahef_rk(3)
- lasd1(3)
- tprfb(3)
- potf2(3)
- laein(3)
- lamc4(3)
- stevd(3)
- gtsv_driver(3)
- gesvd_comp_grp(3)
- la_constants(3)
- gesvx(3)
- hseqr(3)
- launhr_col_getrfnp2(3)
- trcon(3)
- larre(3)
- gelsy(3)
- ptsv(3)
- lacon(3)
- laed_comp_grp(3)
- hpsvx(3)
- gemm(3)
- poequ(3)
- laesy(3)
- lagtf(3)
- trrfs(3)
- ggev3(3)
- pbstf(3)
- poequb(3)
- heevr(3)
- lanhp(3)
- unbdb3(3)
- tgsyl(3)
- lamc5(3)
- geqr2p(3)
- ungqr(3)
- laqz3(3)
- imax1(3)
- gels_top(3)
- hesv(3)
- gelqt(3)
- pfsv_driver(3)
- stegr(3)
- gerqf(3)
- laisnan(3)
- ilatrans(3)
- gbsv_comp(3)
- pbrfs(3)
- lascl2(3)
- larz(3)
- la_hercond(3)
- tgexc(3)
- ggesx(3)
- unbdb6(3)
- ungl2(3)
- laed_comp2(3)
- rscl(3)
- hegv(3)
- gelst(3)
- gbtrs(3)
- pftrf(3)
- langb(3)
- lantr(3)
- laqgb(3)
- ggsvp3(3)
- bdsdc(3)
- ladiv(3)
- laqge(3)
- iparmq(3)
- ggbal(3)
- hb2st_kernels(3)
- lartgs(3)
- lartgp(3)
- rot(3)
- ppequ(3)
- laed3(3)
- her(3)
- hptri(3)
- stevx(3)
- upgtr(3)
- lar2v(3)
- hbev_2stage(3)
- gejsv(3)
- ppsv_driver(3)
- unm22(3)
- gesvxx(3)
- laqz0(3)
- unmtr(3)
- laed5(3)
- tptri(3)
- laed0(3)
- heev_driver2(3)
- hpcon(3)
- lasd4(3)
- hetrf_aa(3)
- geqr_comp3(3)
- rot_aux_grp(3)
- aux_grp(3)
- laebz(3)
- trsyl3(3)
- gges(3)
- gesdd(3)
- trexc(3)
- ung2l(3)
- gesv(3)
- laed4(3)
- md__r_e_a_d_m_e(3)
- blas3_like_grp(3)
- laed1(3)
- larcm(3)
- hbevx(3)
- hesv_driver_grp(3)
- hetrs(3)
- hbevd_2stage(3)
- blas1_grp(3)
- laic1(3)
- geql_comp_grp(3)
- heev_2stage(3)
- hpmv(3)
- pbtf2(3)
- hetrf_aa_2stage(3)
- hbgv(3)
- pptrs(3)
- lapmr(3)
- tpqr_comp_grp(3)
- larfy(3)
- gedmd(3)
- lasr(3)
- hetrd_2stage(3)
- gerfs(3)
- ungtr(3)
- porfsx(3)
- tpmv(3)
- lasd_comp2(3)
- unmbr(3)
- tbtrs(3)
- hetd2(3)
- trsv_comp_grp(3)
- lapy3(3)
- ptts2(3)
- unmhr(3)
- hbev_driver(3)
- lalsa(3)
- tbsv_comp(3)
- hesv_comp_v1(3)
- geql2(3)
- sterf(3)
- larrd(3)
- larft(3)
- lagv2(3)
- gttrf(3)
- tpqrt(3)
- la_lin_berr(3)
- rotg(3)
- solve_top(3)
- lacgv(3)
- larrf(3)
- tbmv(3)
- trsyl(3)
- geequ(3)
- upmtr(3)
- hpgv_driver(3)
- tbsv(3)
- hesvx(3)
- latrz(3)
- tfttr(3)
- gesv_comp_grp(3)
- xerbla_grp(3)
- tpsv(3)
- blas3_grp(3)
- gesvd_driver(3)
- geqr_comp1(3)
- ggev_driver_grp(3)
- la_gbamv(3)
- tpmlqt(3)
- trttf(3)
- larzb(3)
- unmr3(3)
- hecon_rook(3)
- stebz(3)
- lantp(3)
- laqz1(3)
- hesv_rk(3)
- tbcon(3)
- xerbla(3)
- posv_mixed(3)
- latps(3)
- hesv_aa_driver(3)
- gemqr(3)
- larrr(3)
- gebrd(3)
- tgsna(3)
- la_gercond(3)
- gbsv(3)
- hesv_comp_grp(3)
- gesv_mixed(3)
- gghrd(3)
- gbrfs(3)
- tpmqrt(3)
- lasq3(3)
- tpsv_comp(3)
- largv(3)
- gelsd(3)
- pftrs(3)
- asum(3)
- launhr_col_getrfnp(3)
- hptrf(3)
- lacpy(3)
- gesc2(3)
- lasda(3)
- second(3)
- hprfs(3)
- hpsv_comp(3)
- lamrg(3)
- pbsv_comp(3)
- hegv_2stage(3)
- gerq2(3)
- lasdt(3)
- abs1(3)
- hbevd(3)
- hbev_comp(3)
- trsv(3)
- la_porpvgrw(3)
- la_gbrpvgrw(3)
- hbgv_driver(3)
- tgsja(3)
- gebd2(3)
- geqr2(3)
- unm2r(3)
- unmql(3)
- la_gbrfsx_extended(3)
- gelq_comp2(3)
- iparam2stage(3)
- ger(3)
- larf(3)
- ilaprec(3)
- labrd(3)
- unbdb1(3)
- unmlq(3)
- geequb(3)
- la_herfsx_extended(3)
- unbdb2(3)
- lapack_top(3)
- ptsv_driver(3)
- hetrs2(3)
- geqr_comp4(3)
- pbsv(3)
- posv_driver(3)
- steqr(3)
- gels(3)
- lar1v(3)
- hemv(3)
- la_transtype(3)
- hesv_aa(3)
- lacrm(3)
- stevr(3)
- hetf2_rk(3)
- blas2_banded(3)
- stein(3)
- unmrq(3)
- larrk(3)
- hetri2(3)
- hesv_aa_2stage(3)
- pttrf(3)
- gelss(3)
- pbsv_driver(3)
- lasq5(3)
- heevx_2stage(3)
- hetri(3)
- lasd2(3)
- laed2(3)
- pbcon(3)
- ptcon(3)
- laed7(3)
- gels_aux_grp(3)
- hpgvd(3)
- hetf2(3)
- tzrzf(3)
- hpr(3)
- unitary_top(3)
- latsqr(3)
- ungql(3)
- her2(3)
- hetri_3x(3)
- hetrd_hb2st(3)
- tgsen(3)
- ggsvd3(3)
- lasq6(3)
- set_grp(3)
- larfgp(3)
- gels_driver_grp(3)
- pbtrs(3)
- lamswlq(3)
- lanht(3)
- gbsvxx(3)
- tgevc(3)
- ilaenv(3)
- swap(3)
- lae2(3)
- iladiag(3)
- lasq2(3)
- la_heamv(3)
- blas_like_top(3)
- la_gerfsx_extended(3)
- hegst(3)
- tfsm(3)
- gesvd(3)
- ungr2(3)
- ggev(3)
- aux_top(3)
- blas2_packed(3)
- geqlf(3)
- hetrs_rook(3)
- gelq2(3)
- geqrfp(3)
- gbequb(3)
- stev(3)
- lauum(3)
- potrf2(3)
- lamc3(3)
- gbrfsx(3)
- gerq_comp_grp(3)
- pocon(3)
- tbrfs(3)
- heswapr(3)
- lamc2(3)
- hpevd(3)
- hesv_comp_aasen(3)
- scalar_grp(3)
- gemv(3)
- lasv2(3)
- lanhs(3)
- svd_top(3)
- gbsvx(3)
- gesvdx(3)
- tplq_comp_grp(3)
- hesv_driver(3)
- hesv_comp_v2(3)
- trsen(3)
- syconvf(3)
- lasd7(3)
- gbcon(3)
- unbdb(3)
- heev_driver_grp(3)
- ggqrf(3)
- heevx(3)
- gtsvx(3)
- lahef_rook(3)
- hetrf_rook(3)
- hetrf(3)
- trsna(3)
- gebak(3)
- larnv(3)
- ptsv_comp(3)
- laswlq(3)
- lags2(3)
- laed8(3)
- laswp(3)
- hptrs(3)
- unglq(3)
- la_wwaddw(3)
- getrf(3)
- gees(3)
- gbtf2(3)
- hegvx(3)
- latrs3(3)
- roundup_lwork(3)
- unghr(3)
- iamax(3)
- larzt(3)
- pteqr(3)
- ilaver(3)
- trmv(3)
- la_gbrcond(3)
- blas0_like_grp(3)
- nrm2(3)
- heev_top(3)
- gtcon(3)
- heevr_2stage(3)
- pstrf(3)
- rot_comp(3)
- laqr5(3)
- heevd_2stage(3)
- getsls(3)
- hetrd_he2hb(3)
- heequb(3)
- laqp2(3)
- axpy(3)
- blast_aux(3)
- rotmg(3)
- pbsvx(3)
- ilauplo(3)
- herfsx(3)
- laqr2(3)
- blas1_like_grp(3)
- lassq(3)
- larrb(3)
- stev_driver(3)
- geevx(3)
- tpttf(3)
- scal(3)
- laneg(3)
- posv_driver_grp(3)
- lasq1(3)
- hetrs_3(3)
- geqrt2(3)
- gbbrd(3)
- ilalr(3)
- hetri_3(3)
apt-get install liblapack-doc
Manual
gejsv
NAMESYNOPSIS
Functions
Detailed Description
Function Documentation
subroutine cgejsv (character*1 joba, character*1 jobu, character*1 jobv,character*1 jobr, character*1 jobt, character*1 jobp, integer m,integer n, complex, dimension( lda, * ) a, integer lda, real,dimension( n ) sva, complex, dimension( ldu, * ) u, integer ldu,complex, dimension( ldv, * ) v, integer ldv, complex, dimension( lwork) cwork, integer lwork, real, dimension( lrwork ) rwork, integerlrwork, integer, dimension( * ) iwork, integer info)
subroutine dgejsv (character*1 joba, character*1 jobu, character*1 jobv,character*1 jobr, character*1 jobt, character*1 jobp, integer m,integer n, double precision, dimension( lda, * ) a, integer lda, doubleprecision, dimension( n ) sva, double precision, dimension( ldu, * ) u,integer ldu, double precision, dimension( ldv, * ) v, integer ldv,double precision, dimension( lwork ) work, integer lwork, integer,dimension( * ) iwork, integer info)
subroutine sgejsv (character*1 joba, character*1 jobu, character*1 jobv,character*1 jobr, character*1 jobt, character*1 jobp, integer m,integer n, real, dimension( lda, * ) a, integer lda, real, dimension( n) sva, real, dimension( ldu, * ) u, integer ldu, real, dimension( ldv,* ) v, integer ldv, real, dimension( lwork ) work, integer lwork,integer, dimension( * ) iwork, integer info)
subroutine zgejsv (character*1 joba, character*1 jobu, character*1 jobv,character*1 jobr, character*1 jobt, character*1 jobp, integer m,integer n, complex*16, dimension( lda, * ) a, integer lda, doubleprecision, dimension( n ) sva, complex*16, dimension( ldu, * ) u,integer ldu, complex*16, dimension( ldv, * ) v, integer ldv,complex*16, dimension( lwork ) cwork, integer lwork, double precision,dimension( lrwork ) rwork, integer lrwork, integer, dimension( * )iwork, integer info)
Author
NAME
gejsv - gejsv: SVD, Jacobi, high-level
SYNOPSIS
Functions
subroutine
cgejsv
(joba, jobu, jobv, jobr, jobt, jobp, m, n, a,
lda, sva, u, ldu, v, ldv, cwork, lwork, rwork, lrwork,
iwork, info)
CGEJSV
subroutine
dgejsv
(joba, jobu, jobv, jobr, jobt,
jobp, m, n, a, lda, sva, u, ldu, v, ldv, work, lwork, iwork,
info)
DGEJSV
subroutine
sgejsv
(joba, jobu, jobv, jobr, jobt,
jobp, m, n, a, lda, sva, u, ldu, v, ldv, work, lwork, iwork,
info)
SGEJSV
subroutine
zgejsv
(joba, jobu, jobv, jobr, jobt,
jobp, m, n, a, lda, sva, u, ldu, v, ldv, cwork, lwork,
rwork, lrwork, iwork, info)
ZGEJSV
Detailed Description
Function Documentation
subroutine cgejsv (character*1 joba, character*1 jobu, character*1 jobv,character*1 jobr, character*1 jobt, character*1 jobp, integer m,integer n, complex, dimension( lda, * ) a, integer lda, real,dimension( n ) sva, complex, dimension( ldu, * ) u, integer ldu,complex, dimension( ldv, * ) v, integer ldv, complex, dimension( lwork) cwork, integer lwork, real, dimension( lrwork ) rwork, integerlrwork, integer, dimension( * ) iwork, integer info)
CGEJSV
Purpose:
CGEJSV computes
the singular value decomposition (SVD) of a complex M-by-N
matrix [A], where M >= N. The SVD of [A] is written
as
[A] = [U] * [SIGMA] * [V]Λ*,
where [SIGMA]
is an N-by-N (M-by-N) matrix which is zero except for its N
diagonal elements, [U] is an M-by-N (or M-by-M) unitary
matrix, and
[V] is an N-by-N unitary matrix. The diagonal elements of
[SIGMA] are
the singular values of [A]. The columns of [U] and [V] are
the left and
the right singular vectors of [A], respectively. The
matrices [U] and [V]
are computed and stored in the arrays U and V, respectively.
The diagonal
of [SIGMA] is computed and stored in the array SVA.
Parameters
JOBA
JOBA is
CHARACTER*1
Specifies the level of accuracy:
= βCβ: This option works well (high relative
accuracy) if A = B * D,
with well-conditioned B and arbitrary diagonal matrix D.
The accuracy cannot be spoiled by COLUMN scaling. The
accuracy of the computed output depends on the condition of
B, and the procedure aims at the best theoretical accuracy.
The relative error max_{i=1:N}|d sigma_i| / sigma_i is
bounded by f(M,N)*epsilon* cond(B), independent of D.
The input matrix is preprocessed with the QRF with column
pivoting. This initial preprocessing and preconditioning by
a rank revealing QR factorization is common for all values
of
JOBA. Additional actions are specified as follows:
= βEβ: Computation as with βCβ with
an additional estimate of the
condition number of B. It provides a realistic error bound.
= βFβ: If A = D1 * C * D2 with ill-conditioned
diagonal scalings
D1, D2, and well-conditioned matrix C, this option gives
higher accuracy than the βCβ option. If the
structure of the
input matrix is not known, and relative accuracy is
desirable, then this option is advisable. The input matrix A
is preprocessed with QR factorization with FULL (row and
column) pivoting.
= βGβ: Computation as with βFβ with
an additional estimate of the
condition number of B, where A=B*D. If A has heavily
weighted
rows, then using this condition number gives too pessimistic
error bound.
= βAβ: Small singular values are not well
determined by the data
and are considered as noisy; the matrix is treated as
numerically rank deficient. The error in the computed
singular values is bounded by f(m,n)*epsilon*||A||.
The computed SVD A = U * S * VΛ* restores A up to
f(m,n)*epsilon*||A||.
This gives the procedure the licence to discard (set to
zero)
all singular values below N*epsilon*||A||.
= βRβ: Similar as in βAβ. Rank
revealing property of the initial
QR factorization is used do reveal (using triangular factor)
a gap sigma_{r+1} < epsilon * sigma_r in which case the
numerical RANK is declared to be r. The SVD is computed with
absolute error bounds, but more accurately than with
βAβ.
JOBU
JOBU is
CHARACTER*1
Specifies whether to compute the columns of U:
= βUβ: N columns of U are returned in the array
U.
= βFβ: full set of M left sing. vectors is
returned in the array U.
= βWβ: U may be used as workspace of length M*N.
See the description
of U.
= βNβ: U is not computed.
JOBV
JOBV is
CHARACTER*1
Specifies whether to compute the matrix V:
= βVβ: N columns of V are returned in the array
V; Jacobi rotations
are not explicitly accumulated.
= βJβ: N columns of V are returned in the array
V, but they are
computed as the product of Jacobi rotations, if JOBT =
βNβ.
= βWβ: V may be used as workspace of length N*N.
See the description
of V.
= βNβ: V is not computed.
JOBR
JOBR is
CHARACTER*1
Specifies the RANGE for the singular values. Issues the
licence to
set to zero small positive singular values if they are
outside
specified range. If A .NE. 0 is scaled so that the largest
singular
value of c*A is around SQRT(BIG),
BIG=SLAMCH(βOβ), then JOBR issues
the licence to kill columns of A whose norm in c*A is less
than
SQRT(SFMIN) (for JOBR = βRβ), or less than
SMALL=SFMIN/EPSLN,
where SFMIN=SLAMCH(βSβ),
EPSLN=SLAMCH(βEβ).
= βNβ: Do not kill small columns of c*A. This
option assumes that
BLAS and QR factorizations and triangular solvers are
implemented to work in that range. If the condition of A
is greater than BIG, use CGESVJ.
= βRβ: RESTRICTED range for sigma(c*A) is
[SQRT(SFMIN), SQRT(BIG)]
(roughly, as described above). This option is recommended.
===========================
For computing the singular values in the FULL range
[SFMIN,BIG]
use CGESVJ.
JOBT
JOBT is
CHARACTER*1
If the matrix is square then the procedure may determine to
use
transposed A if AΛ* seems to be better with respect to
convergence.
If the matrix is not square, JOBT is ignored.
The decision is based on two values of entropy over the
adjoint
orbit of AΛ* * A. See the descriptions of RWORK(6) and
RWORK(7).
= βTβ: transpose if entropy test indicates
possibly faster
convergence of Jacobi process if AΛ* is taken as input.
If A is
replaced with AΛ*, then the row pivoting is included
automatically.
= βNβ: do not speculate.
The option βTβ can be used to compute only the
singular values, or
the full SVD (U, SIGMA and V). For only one set of singular
vectors
(U or V), the caller should provide both U and V, as one of
the
matrices is used as workspace if the matrix A is transposed.
The implementer can easily remove this constraint and make
the
code more complicated. See the descriptions of U and V.
In general, this option is considered experimental, and
βNβ; should
be preferred. This is subject to changes in the future.
JOBP
JOBP is
CHARACTER*1
Issues the licence to introduce structured perturbations to
drown
denormalized numbers. This licence should be active if the
denormals are poorly implemented, causing slow computation,
especially in cases of fast convergence (!). For details see
[1,2].
For the sake of simplicity, this perturbations are included
only
when the full SVD or only the singular values are requested.
The
implementer/user can easily add the perturbation for the
cases of
computing one set of singular vectors.
= βPβ: introduce perturbation
= βNβ: do not perturb
M
M is INTEGER
The number of rows of the input matrix A. M >= 0.
N
N is INTEGER
The number of columns of the input matrix A. M >= N >=
0.
A
A is COMPLEX
array, dimension (LDA,N)
On entry, the M-by-N matrix A.
LDA
LDA is INTEGER
The leading dimension of the array A. LDA >=
max(1,M).
SVA
SVA is REAL
array, dimension (N)
On exit,
- For RWORK(1)/RWORK(2) = ONE: The singular values of A.
During
the computation SVA contains Euclidean column norms of the
iterated matrices in the array A.
- For RWORK(1) .NE. RWORK(2): The singular values of A are
(RWORK(1)/RWORK(2)) * SVA(1:N). This factored form is used
if
sigma_max(A) overflows or if small singular values have been
saved from underflow by scaling the input matrix A.
- If JOBR=βRβ then some of the singular values
may be returned
as exact zeros obtained by βset to zeroβ because
they are
below the numerical rank threshold or are denormalized
numbers.
U
U is COMPLEX
array, dimension ( LDU, N ) or ( LDU, M )
If JOBU = βUβ, then U contains on exit the
M-by-N matrix of
the left singular vectors.
If JOBU = βFβ, then U contains on exit the
M-by-M matrix of
the left singular vectors, including an ONB
of the orthogonal complement of the Range(A).
If JOBU = βWβ .AND. (JOBV = βVβ
.AND. JOBT = βTβ .AND. M = N),
then U is used as workspace if the procedure
replaces A with AΛ*. In that case, [V] is computed
in U as left singular vectors of AΛ* and then
copied back to the V array. This βWβ option is
just
a reminder to the caller that in this case U is
reserved as workspace of length N*N.
If JOBU = βNβ U is not referenced, unless
JOBT=βTβ.
LDU
LDU is INTEGER
The leading dimension of the array U, LDU >= 1.
IF JOBU = βUβ or βFβ or
βWβ, then LDU >= M.
V
V is COMPLEX
array, dimension ( LDV, N )
If JOBV = βVβ, βJβ then V contains
on exit the N-by-N matrix of
the right singular vectors;
If JOBV = βWβ, AND (JOBU = βUβ AND
JOBT = βTβ AND M = N),
then V is used as workspace if the procedure
replaces A with AΛ*. In that case, [U] is computed
in V as right singular vectors of AΛ* and then
copied back to the U array. This βWβ option is
just
a reminder to the caller that in this case V is
reserved as workspace of length N*N.
If JOBV = βNβ V is not referenced, unless
JOBT=βTβ.
LDV
LDV is INTEGER
The leading dimension of the array V, LDV >= 1.
If JOBV = βVβ or βJβ or
βWβ, then LDV >= N.
CWORK
CWORK is
COMPLEX array, dimension (MAX(2,LWORK))
If the call to CGEJSV is a workspace query (indicated by
LWORK=-1 or
LRWORK=-1), then on exit CWORK(1) contains the required
length of
CWORK for the job parameters used in the call.
LWORK
LWORK is
INTEGER
Length of CWORK to confirm proper allocation of workspace.
LWORK depends on the job:
1. If only
SIGMA is needed ( JOBU = βNβ, JOBV =
βNβ ) and
1.1 .. no scaled condition estimate required
(JOBA.NE.βEβ.AND.JOBA.NE.βGβ):
LWORK >= 2*N+1. This is the minimal requirement.
->> For optimal performance (blocked code) the optimal
value
is LWORK >= N + (N+1)*NB. Here NB is the optimal
block size for CGEQP3 and CGEQRF.
In general, optimal LWORK is computed as
LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CGEQRF),
LWORK(CGESVJ)).
1.2. .. an estimate of the scaled condition number of A is
required (JOBA=βEβ, or βGβ). In this
case, LWORK the minimal
requirement is LWORK >= N*N + 2*N.
->> For optimal performance (blocked code) the optimal
value
is LWORK >= max(N+(N+1)*NB, N*N+2*N)=N**2+2*N.
In general, the optimal length LWORK is computed as
LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CGEQRF),
LWORK(CGESVJ),
N*N+LWORK(CPOCON)).
2. If SIGMA and the right singular vectors are needed (JOBV
= βVβ),
(JOBU = βNβ)
2.1 .. no scaled condition estimate requested (JOBE =
βNβ):
-> the minimal requirement is LWORK >= 3*N.
-> For optimal performance,
LWORK >= max(N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
where NB is the optimal block size for CGEQP3, CGEQRF,
CGELQF,
CUNMLQ. In general, the optimal length LWORK is computed as
LWORK >= max(N+LWORK(CGEQP3), N+LWORK(CGESVJ),
N+LWORK(CGELQF), 2*N+LWORK(CGEQRF), N+LWORK(CUNMLQ)).
2.2 .. an estimate of the scaled condition number of A is
required (JOBA=βEβ, or βGβ).
-> the minimal requirement is LWORK >= 3*N.
-> For optimal performance,
LWORK >= max(N+(N+1)*NB, 2*N,2*N+N*NB)=2*N+N*NB,
where NB is the optimal block size for CGEQP3, CGEQRF,
CGELQF,
CUNMLQ. In general, the optimal length LWORK is computed as
LWORK >= max(N+LWORK(CGEQP3), LWORK(CPOCON),
N+LWORK(CGESVJ),
N+LWORK(CGELQF), 2*N+LWORK(CGEQRF), N+LWORK(CUNMLQ)).
3. If SIGMA and the left singular vectors are needed
3.1 .. no scaled condition estimate requested (JOBE =
βNβ):
-> the minimal requirement is LWORK >= 3*N.
-> For optimal performance:
if JOBU = βUβ :: LWORK >= max(3*N,
N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
where NB is the optimal block size for CGEQP3, CGEQRF,
CUNMQR.
In general, the optimal length LWORK is computed as
LWORK >= max(N+LWORK(CGEQP3), 2*N+LWORK(CGEQRF),
N+LWORK(CUNMQR)).
3.2 .. an estimate of the scaled condition number of A is
required (JOBA=βEβ, or βGβ).
-> the minimal requirement is LWORK >= 3*N.
-> For optimal performance:
if JOBU = βUβ :: LWORK >= max(3*N,
N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
where NB is the optimal block size for CGEQP3, CGEQRF,
CUNMQR.
In general, the optimal length LWORK is computed as
LWORK >= max(N+LWORK(CGEQP3),N+LWORK(CPOCON),
2*N+LWORK(CGEQRF), N+LWORK(CUNMQR)).
4. If the full
SVD is needed: (JOBU = βUβ or JOBU =
βFβ) and
4.1. if JOBV = βVβ
the minimal requirement is LWORK >= 5*N+2*N*N.
4.2. if JOBV = βJβ the minimal requirement is
LWORK >= 4*N+N*N.
In both cases, the allocated CWORK can accommodate blocked
runs
of CGEQP3, CGEQRF, CGELQF, CUNMQR, CUNMLQ.
If the call to
CGEJSV is a workspace query (indicated by LWORK=-1 or
LRWORK=-1), then on exit CWORK(1) contains the optimal and
CWORK(2) contains the
minimal length of CWORK for the job parameters used in the
call.
RWORK
RWORK is REAL
array, dimension (MAX(7,LRWORK))
On exit,
RWORK(1) = Determines the scaling factor SCALE = RWORK(2) /
RWORK(1)
such that SCALE*SVA(1:N) are the computed singular values
of A. (See the description of SVA().)
RWORK(2) = See the description of RWORK(1).
RWORK(3) = SCONDA is an estimate for the condition number of
column equilibrated A. (If JOBA = βEβ or
βGβ)
SCONDA is an estimate of SQRT(||(RΛ* *
R)Λ(-1)||_1).
It is computed using CPOCON. It holds
NΛ(-1/4) * SCONDA <= ||RΛ(-1)||_2 <=
NΛ(1/4) * SCONDA
where R is the triangular factor from the QRF of A.
However, if R is truncated and the numerical rank is
determined to be strictly smaller than N, SCONDA is
returned as -1, thus indicating that the smallest
singular values might be lost.
If full SVD is
needed, the following two condition numbers are
useful for the analysis of the algorithm. They are provided
for
a developer/implementer who is familiar with the details of
the method.
RWORK(4) = an
estimate of the scaled condition number of the
triangular factor in the first QR factorization.
RWORK(5) = an estimate of the scaled condition number of the
triangular factor in the second QR factorization.
The following two parameters are computed if JOBT =
βTβ.
They are provided for a developer/implementer who is
familiar
with the details of the method.
RWORK(6) = the entropy of AΛ* * A :: this is the
Shannon entropy
of diag(AΛ* * A) / Trace(AΛ* * A) taken as point
in the
probability simplex.
RWORK(7) = the entropy of A * AΛ*. (See the description
of RWORK(6).)
If the call to CGEJSV is a workspace query (indicated by
LWORK=-1 or
LRWORK=-1), then on exit RWORK(1) contains the required
length of
RWORK for the job parameters used in the call.
LRWORK
LRWORK is
INTEGER
Length of RWORK to confirm proper allocation of workspace.
LRWORK depends on the job:
1. If only the
singular values are requested i.e. if
LSAME(JOBU,βNβ) .AND.
LSAME(JOBV,βNβ)
then:
1.1. If LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR.
LSAME(JOBA,βGβ),
then: LRWORK = max( 7, 2 * M ).
1.2. Otherwise, LRWORK = max( 7, N ).
2. If singular values with the right singular vectors are
requested
i.e. if
(LSAME(JOBV,βVβ).OR.LSAME(JOBV,βJβ))
.AND.
.NOT.(LSAME(JOBU,βUβ).OR.LSAME(JOBU,βFβ))
then:
2.1. If LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR.
LSAME(JOBA,βGβ),
then LRWORK = max( 7, 2 * M ).
2.2. Otherwise, LRWORK = max( 7, N ).
3. If singular values with the left singular vectors are
requested, i.e. if
(LSAME(JOBU,βUβ).OR.LSAME(JOBU,βFβ))
.AND.
.NOT.(LSAME(JOBV,βVβ).OR.LSAME(JOBV,βJβ))
then:
3.1. If LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR.
LSAME(JOBA,βGβ),
then LRWORK = max( 7, 2 * M ).
3.2. Otherwise, LRWORK = max( 7, N ).
4. If singular values with both the left and the right
singular vectors
are requested, i.e. if
(LSAME(JOBU,βUβ).OR.LSAME(JOBU,βFβ))
.AND.
(LSAME(JOBV,βVβ).OR.LSAME(JOBV,βJβ))
then:
4.1. If LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR.
LSAME(JOBA,βGβ),
then LRWORK = max( 7, 2 * M ).
4.2. Otherwise, LRWORK = max( 7, N ).
If, on entry,
LRWORK = -1 or LWORK=-1, a workspace query is assumed and
the length of RWORK is returned in RWORK(1).
IWORK
IWORK is
INTEGER array, of dimension at least 4, that further depends
on the job:
1. If only the
singular values are requested then:
If ( LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR. LSAME(JOBA,βGβ)
)
then the length of IWORK is N+M; otherwise the length of
IWORK is N.
2. If the singular values and the right singular vectors are
requested then:
If ( LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR. LSAME(JOBA,βGβ)
)
then the length of IWORK is N+M; otherwise the length of
IWORK is N.
3. If the singular values and the left singular vectors are
requested then:
If ( LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR. LSAME(JOBA,βGβ)
)
then the length of IWORK is N+M; otherwise the length of
IWORK is N.
4. If the singular values with both the left and the right
singular vectors
are requested, then:
4.1. If LSAME(JOBV,βJβ) the length of IWORK is
determined as follows:
If ( LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR. LSAME(JOBA,βGβ)
)
then the length of IWORK is N+M; otherwise the length of
IWORK is N.
4.2. If LSAME(JOBV,βVβ) the length of IWORK is
determined as follows:
If ( LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR. LSAME(JOBA,βGβ)
)
then the length of IWORK is 2*N+M; otherwise the length of
IWORK is 2*N.
On exit,
IWORK(1) = the numerical rank determined after the initial
QR factorization with pivoting. See the descriptions
of JOBA and JOBR.
IWORK(2) = the number of the computed nonzero singular
values
IWORK(3) = if nonzero, a warning message:
If IWORK(3) = 1 then some of the column norms of A
were denormalized floats. The requested high accuracy
is not warranted by the data.
IWORK(4) = 1 or -1. If IWORK(4) = 1, then the procedure used
AΛ* to
do the job as specified by the JOB parameters.
If the call to CGEJSV is a workspace query (indicated by
LWORK = -1 and
LRWORK = -1), then on exit IWORK(1) contains the required
length of
IWORK for the job parameters used in the call.
INFO
INFO is INTEGER
< 0: if INFO = -i, then the i-th argument had an illegal
value.
= 0: successful exit;
> 0: CGEJSV did not converge in the maximal allowed
number
of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
CGEJSV
implements a preconditioned Jacobi SVD algorithm. It uses
CGEQP3,
CGEQRF, and CGELQF as preprocessors and preconditioners.
Optionally, an
additional row pivoting can be used as a preprocessor, which
in some
cases results in much higher accuracy. An example is matrix
A with the
structure A = D1 * C * D2, where D1, D2 are arbitrarily
ill-conditioned
diagonal matrices and C is well-conditioned matrix. In that
case, complete
pivoting in the first QR factorizations provides accuracy
dependent on the
condition number of C, and independent of D1, D2. Such
higher accuracy is
not completely understood theoretically, but it works well
in practice.
Further, if A can be written as A = B*D, with
well-conditioned B and some
diagonal D, then the high accuracy is guaranteed, both
theoretically and
in software, independent of D. For more details see [1],
[2].
The computational range for the singular values can be the
full range
( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic
and the BLAS
& LAPACK routines called by CGEJSV are implemented to
work in that range.
If that is not the case, then the restriction for safe
computation with
the singular values in the range of normalized IEEE numbers
is that the
spectral condition number kappa(A)=sigma_max(A)/sigma_min(A)
does not
overflow. This code (CGEJSV) is best used in this restricted
range,
meaning that singular values of magnitude below ||A||_2 /
SLAMCH(βOβ) are
returned as zeros. See JOBR for details on this.
Further, this implementation is somewhat slower than the one
described
in [1,2] due to replacement of some non-LAPACK components,
and because
the choice of some tuning parameters in the iterative part
(CGESVJ) is
left to the implementer on a particular machine.
The rank revealing QR factorization (in this code: CGEQP3)
should be
implemented as in [3]. We have a new version of CGEQP3 under
development
that is more robust than the current one in LAPACK, with a
cleaner cut in
rank deficient cases. It will be available in the SIGMA
library [4].
If M is much larger than N, it is obvious that the initial
QRF with
column pivoting can be preprocessed by the QRF without
pivoting. That
well known trick is not used in CGEJSV because in some cases
heavy row
weighting can be treated with complete pivoting. The
overhead in cases
M much larger than N is then only due to pivoting, but the
benefits in
terms of accuracy have prevailed. The implementer/user can
incorporate
this extra QRF step easily. The implementer can also improve
data movement
(matrix transpose, matrix copy, matrix transposed copy) -
this
implementation of CGEJSV uses only the simplest, naive data
movement.
Contributor:
Zlatko Drmac (Zagreb, Croatia)
References:
[1] Z. Drmac
and K. Veselic: New fast and accurate Jacobi SVD algorithm
I.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp.
1322-1342.
LAPACK Working note 169.
[2] Z. Drmac and K. Veselic: New fast and accurate Jacobi
SVD algorithm II.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp.
1343-1362.
LAPACK Working note 170.
[3] Z. Drmac and Z. Bujanovic: On the failure of
rank-revealing QR
factorization software - a case study.
ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
LAPACK Working note 176.
[4] Z. Drmac: SIGMA - mathematical software library for
accurate SVD, PSV,
QSVD, (H,K)-SVD computations.
Department of Mathematics, University of Zagreb, 2008,
2016.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac@math.hr. Thank you.
subroutine dgejsv (character*1 joba, character*1 jobu, character*1 jobv,character*1 jobr, character*1 jobt, character*1 jobp, integer m,integer n, double precision, dimension( lda, * ) a, integer lda, doubleprecision, dimension( n ) sva, double precision, dimension( ldu, * ) u,integer ldu, double precision, dimension( ldv, * ) v, integer ldv,double precision, dimension( lwork ) work, integer lwork, integer,dimension( * ) iwork, integer info)
DGEJSV
Purpose:
DGEJSV computes
the singular value decomposition (SVD) of a real M-by-N
matrix [A], where M >= N. The SVD of [A] is written
as
[A] = [U] * [SIGMA] * [V]Λt,
where [SIGMA]
is an N-by-N (M-by-N) matrix which is zero except for its N
diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal
matrix, and
[V] is an N-by-N orthogonal matrix. The diagonal elements of
[SIGMA] are
the singular values of [A]. The columns of [U] and [V] are
the left and
the right singular vectors of [A], respectively. The
matrices [U] and [V]
are computed and stored in the arrays U and V, respectively.
The diagonal
of [SIGMA] is computed and stored in the array SVA.
DGEJSV can sometimes compute tiny singular values and their
singular vectors much
more accurately than other SVD routines, see below under
Further Details.
Parameters
JOBA
JOBA is
CHARACTER*1
Specifies the level of accuracy:
= βCβ: This option works well (high relative
accuracy) if A = B * D,
with well-conditioned B and arbitrary diagonal matrix D.
The accuracy cannot be spoiled by COLUMN scaling. The
accuracy of the computed output depends on the condition of
B, and the procedure aims at the best theoretical accuracy.
The relative error max_{i=1:N}|d sigma_i| / sigma_i is
bounded by f(M,N)*epsilon* cond(B), independent of D.
The input matrix is preprocessed with the QRF with column
pivoting. This initial preprocessing and preconditioning by
a rank revealing QR factorization is common for all values
of
JOBA. Additional actions are specified as follows:
= βEβ: Computation as with βCβ with
an additional estimate of the
condition number of B. It provides a realistic error bound.
= βFβ: If A = D1 * C * D2 with ill-conditioned
diagonal scalings
D1, D2, and well-conditioned matrix C, this option gives
higher accuracy than the βCβ option. If the
structure of the
input matrix is not known, and relative accuracy is
desirable, then this option is advisable. The input matrix A
is preprocessed with QR factorization with FULL (row and
column) pivoting.
= βGβ: Computation as with βFβ with
an additional estimate of the
condition number of B, where A=D*B. If A has heavily
weighted
rows, then using this condition number gives too pessimistic
error bound.
= βAβ: Small singular values are the noise and
the matrix is treated
as numerically rank deficient. The error in the computed
singular values is bounded by f(m,n)*epsilon*||A||.
The computed SVD A = U * S * VΛt restores A up to
f(m,n)*epsilon*||A||.
This gives the procedure the licence to discard (set to
zero)
all singular values below N*epsilon*||A||.
= βRβ: Similar as in βAβ. Rank
revealing property of the initial
QR factorization is used do reveal (using triangular factor)
a gap sigma_{r+1} < epsilon * sigma_r in which case the
numerical RANK is declared to be r. The SVD is computed with
absolute error bounds, but more accurately than with
βAβ.
JOBU
JOBU is
CHARACTER*1
Specifies whether to compute the columns of U:
= βUβ: N columns of U are returned in the array
U.
= βFβ: full set of M left sing. vectors is
returned in the array U.
= βWβ: U may be used as workspace of length M*N.
See the description
of U.
= βNβ: U is not computed.
JOBV
JOBV is
CHARACTER*1
Specifies whether to compute the matrix V:
= βVβ: N columns of V are returned in the array
V; Jacobi rotations
are not explicitly accumulated.
= βJβ: N columns of V are returned in the array
V, but they are
computed as the product of Jacobi rotations. This option is
allowed only if JOBU .NE. βNβ, i.e. in computing
the full SVD.
= βWβ: V may be used as workspace of length N*N.
See the description
of V.
= βNβ: V is not computed.
JOBR
JOBR is
CHARACTER*1
Specifies the RANGE for the singular values. Issues the
licence to
set to zero small positive singular values if they are
outside
specified range. If A .NE. 0 is scaled so that the largest
singular
value of c*A is around DSQRT(BIG),
BIG=SLAMCH(βOβ), then JOBR issues
the licence to kill columns of A whose norm in c*A is less
than
DSQRT(SFMIN) (for JOBR = βRβ), or less than
SMALL=SFMIN/EPSLN,
where SFMIN=SLAMCH(βSβ),
EPSLN=SLAMCH(βEβ).
= βNβ: Do not kill small columns of c*A. This
option assumes that
BLAS and QR factorizations and triangular solvers are
implemented to work in that range. If the condition of A
is greater than BIG, use DGESVJ.
= βRβ: RESTRICTED range for sigma(c*A) is
[DSQRT(SFMIN), DSQRT(BIG)]
(roughly, as described above). This option is recommended.
ΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛΛ
For computing the singular values in the FULL range
[SFMIN,BIG]
use DGESVJ.
JOBT
JOBT is
CHARACTER*1
If the matrix is square then the procedure may determine to
use
transposed A if AΛt seems to be better with respect to
convergence.
If the matrix is not square, JOBT is ignored. This is
subject to
changes in the future.
The decision is based on two values of entropy over the
adjoint
orbit of AΛt * A. See the descriptions of WORK(6) and
WORK(7).
= βTβ: transpose if entropy test indicates
possibly faster
convergence of Jacobi process if AΛt is taken as input.
If A is
replaced with AΛt, then the row pivoting is included
automatically.
= βNβ: do not speculate.
This option can be used to compute only the singular values,
or the
full SVD (U, SIGMA and V). For only one set of singular
vectors
(U or V), the caller should provide both U and V, as one of
the
matrices is used as workspace if the matrix A is transposed.
The implementer can easily remove this constraint and make
the
code more complicated. See the descriptions of U and V.
JOBP
JOBP is
CHARACTER*1
Issues the licence to introduce structured perturbations to
drown
denormalized numbers. This licence should be active if the
denormals are poorly implemented, causing slow computation,
especially in cases of fast convergence (!). For details see
[1,2].
For the sake of simplicity, this perturbations are included
only
when the full SVD or only the singular values are requested.
The
implementer/user can easily add the perturbation for the
cases of
computing one set of singular vectors.
= βPβ: introduce perturbation
= βNβ: do not perturb
M
M is INTEGER
The number of rows of the input matrix A. M >= 0.
N
N is INTEGER
The number of columns of the input matrix A. M >= N >=
0.
A
A is DOUBLE
PRECISION array, dimension (LDA,N)
On entry, the M-by-N matrix A.
LDA
LDA is INTEGER
The leading dimension of the array A. LDA >=
max(1,M).
SVA
SVA is DOUBLE
PRECISION array, dimension (N)
On exit,
- For WORK(1)/WORK(2) = ONE: The singular values of A.
During the
computation SVA contains Euclidean column norms of the
iterated matrices in the array A.
- For WORK(1) .NE. WORK(2): The singular values of A are
(WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
sigma_max(A) overflows or if small singular values have been
saved from underflow by scaling the input matrix A.
- If JOBR=βRβ then some of the singular values
may be returned
as exact zeros obtained by βset to zeroβ because
they are
below the numerical rank threshold or are denormalized
numbers.
U
U is DOUBLE
PRECISION array, dimension ( LDU, N ) or ( LDU, M )
If JOBU = βUβ, then U contains on exit the
M-by-N matrix of
the left singular vectors.
If JOBU = βFβ, then U contains on exit the
M-by-M matrix of
the left singular vectors, including an ONB
of the orthogonal complement of the Range(A).
If JOBU = βWβ .AND. (JOBV = βVβ
.AND. JOBT = βTβ .AND. M = N),
then U is used as workspace if the procedure
replaces A with AΛt. In that case, [V] is computed
in U as left singular vectors of AΛt and then
copied back to the V array. This βWβ option is
just
a reminder to the caller that in this case U is
reserved as workspace of length N*N.
If JOBU = βNβ U is not referenced, unless
JOBT=βTβ.
LDU
LDU is INTEGER
The leading dimension of the array U, LDU >= 1.
IF JOBU = βUβ or βFβ or
βWβ, then LDU >= M.
V
V is DOUBLE
PRECISION array, dimension ( LDV, N )
If JOBV = βVβ, βJβ then V contains
on exit the N-by-N matrix of
the right singular vectors;
If JOBV = βWβ, AND (JOBU = βUβ AND
JOBT = βTβ AND M = N),
then V is used as workspace if the procedure
replaces A with AΛt. In that case, [U] is computed
in V as right singular vectors of AΛt and then
copied back to the U array. This βWβ option is
just
a reminder to the caller that in this case V is
reserved as workspace of length N*N.
If JOBV = βNβ V is not referenced, unless
JOBT=βTβ.
LDV
LDV is INTEGER
The leading dimension of the array V, LDV >= 1.
If JOBV = βVβ or βJβ or
βWβ, then LDV >= N.
WORK
WORK is DOUBLE
PRECISION array, dimension (MAX(7,LWORK))
On exit, if N > 0 .AND. M > 0 (else not referenced),
WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor
such
that SCALE*SVA(1:N) are the computed singular values
of A. (See the description of SVA().)
WORK(2) = See the description of WORK(1).
WORK(3) = SCONDA is an estimate for the condition number of
column equilibrated A. (If JOBA = βEβ or
βGβ)
SCONDA is an estimate of DSQRT(||(RΛt *
R)Λ(-1)||_1).
It is computed using DPOCON. It holds
NΛ(-1/4) * SCONDA <= ||RΛ(-1)||_2 <=
NΛ(1/4) * SCONDA
where R is the triangular factor from the QRF of A.
However, if R is truncated and the numerical rank is
determined to be strictly smaller than N, SCONDA is
returned as -1, thus indicating that the smallest
singular values might be lost.
If full SVD is
needed, the following two condition numbers are
useful for the analysis of the algorithm. They are provided
for
a developer/implementer who is familiar with the details of
the method.
WORK(4) = an
estimate of the scaled condition number of the
triangular factor in the first QR factorization.
WORK(5) = an estimate of the scaled condition number of the
triangular factor in the second QR factorization.
The following two parameters are computed if JOBT =
βTβ.
They are provided for a developer/implementer who is
familiar
with the details of the method.
WORK(6) = the
entropy of AΛt*A :: this is the Shannon entropy
of diag(AΛt*A) / Trace(AΛt*A) taken as point in
the
probability simplex.
WORK(7) = the entropy of A*AΛt.
LWORK
LWORK is
INTEGER
Length of WORK to confirm proper allocation of work space.
LWORK depends on the job:
If only SIGMA
is needed (JOBU = βNβ, JOBV = βNβ)
and
-> .. no scaled condition estimate required (JOBE =
βNβ):
LWORK >= max(2*M+N,4*N+1,7). This is the minimal
requirement.
->> For optimal performance (blocked code) the optimal
value
is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the
optimal
block size for DGEQP3 and DGEQRF.
In general, optimal LWORK is computed as
LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).
-> .. an estimate of the scaled condition number of A is
required (JOBA=βEβ, βGβ). In this
case, LWORK is the maximum
of the above and N*N+4*N, i.e. LWORK >=
max(2*M+N,N*N+4*N,7).
->> For optimal performance (blocked code) the optimal
value
is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
In general, the optimal length LWORK is computed as
LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF),
N+N*N+LWORK(DPOCON),7).
If SIGMA and
the right singular vectors are needed (JOBV =
βVβ),
-> the minimal requirement is LWORK >=
max(2*M+N,4*N+1,7).
-> For optimal performance, LWORK >=
max(2*M+N,3*N+(N+1)*NB,7),
where NB is the optimal block size for DGEQP3, DGEQRF,
DGELQF,
DORMLQ. In general, the optimal length LWORK is computed as
LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON),
N+LWORK(DGELQF), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).
If SIGMA and
the left singular vectors are needed
-> the minimal requirement is LWORK >=
max(2*M+N,4*N+1,7).
-> For optimal performance:
if JOBU = βUβ :: LWORK >=
max(2*M+N,3*N+(N+1)*NB,7),
if JOBU = βFβ :: LWORK >=
max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
where NB is the optimal block size for DGEQP3, DGEQRF,
DORMQR.
In general, the optimal length LWORK is computed as
LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
2*N+LWORK(DGEQRF), N+LWORK(DORMQR)).
Here LWORK(DORMQR) equals N*NB (for JOBU = βUβ)
or
M*NB (for JOBU = βFβ).
If the full SVD
is needed: (JOBU = βUβ or JOBU =
βFβ) and
-> if JOBV = βVβ
the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).
-> if JOBV = βJβ the minimal requirement is
LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
-> For optimal performance, LWORK should be additionally
larger than N+M*NB, where NB is the optimal block size
for DORMQR.
IWORK
IWORK is
INTEGER array, dimension (MAX(3,M+3*N)).
On exit,
IWORK(1) = the numerical rank determined after the initial
QR factorization with pivoting. See the descriptions
of JOBA and JOBR.
IWORK(2) = the number of the computed nonzero singular
values
IWORK(3) = if nonzero, a warning message:
If IWORK(3) = 1 then some of the column norms of A
were denormalized floats. The requested high accuracy
is not warranted by the data.
INFO
INFO is INTEGER
< 0: if INFO = -i, then the i-th argument had an illegal
value.
= 0: successful exit;
> 0: DGEJSV did not converge in the maximal allowed
number
of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
DGEJSV
implements a preconditioned Jacobi SVD algorithm. It uses
DGEQP3,
DGEQRF, and DGELQF as preprocessors and preconditioners.
Optionally, an
additional row pivoting can be used as a preprocessor, which
in some
cases results in much higher accuracy. An example is matrix
A with the
structure A = D1 * C * D2, where D1, D2 are arbitrarily
ill-conditioned
diagonal matrices and C is well-conditioned matrix. In that
case, complete
pivoting in the first QR factorizations provides accuracy
dependent on the
condition number of C, and independent of D1, D2. Such
higher accuracy is
not completely understood theoretically, but it works well
in practice.
Further, if A can be written as A = B*D, with
well-conditioned B and some
diagonal D, then the high accuracy is guaranteed, both
theoretically and
in software, independent of D. For more details see [1],
[2].
The computational range for the singular values can be the
full range
( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic
and the BLAS
& LAPACK routines called by DGEJSV are implemented to
work in that range.
If that is not the case, then the restriction for safe
computation with
the singular values in the range of normalized IEEE numbers
is that the
spectral condition number kappa(A)=sigma_max(A)/sigma_min(A)
does not
overflow. This code (DGEJSV) is best used in this restricted
range,
meaning that singular values of magnitude below ||A||_2 /
DLAMCH(βOβ) are
returned as zeros. See JOBR for details on this.
Further, this implementation is somewhat slower than the one
described
in [1,2] due to replacement of some non-LAPACK components,
and because
the choice of some tuning parameters in the iterative part
(DGESVJ) is
left to the implementer on a particular machine.
The rank revealing QR factorization (in this code: DGEQP3)
should be
implemented as in [3]. We have a new version of DGEQP3 under
development
that is more robust than the current one in LAPACK, with a
cleaner cut in
rank deficient cases. It will be available in the SIGMA
library [4].
If M is much larger than N, it is obvious that the initial
QRF with
column pivoting can be preprocessed by the QRF without
pivoting. That
well known trick is not used in DGEJSV because in some cases
heavy row
weighting can be treated with complete pivoting. The
overhead in cases
M much larger than N is then only due to pivoting, but the
benefits in
terms of accuracy have prevailed. The implementer/user can
incorporate
this extra QRF step easily. The implementer can also improve
data movement
(matrix transpose, matrix copy, matrix transposed copy) -
this
implementation of DGEJSV uses only the simplest, naive data
movement.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
[1] Z. Drmac
and K. Veselic: New fast and accurate Jacobi SVD algorithm
I.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp.
1322-1342.
LAPACK Working note 169.
[2] Z. Drmac and K. Veselic: New fast and accurate Jacobi
SVD algorithm II.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp.
1343-1362.
LAPACK Working note 170.
[3] Z. Drmac and Z. Bujanovic: On the failure of
rank-revealing QR
factorization software - a case study.
ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
LAPACK Working note 176.
[4] Z. Drmac: SIGMA - mathematical software library for
accurate SVD, PSV,
QSVD, (H,K)-SVD computations.
Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac@math.hr. Thank you.
subroutine sgejsv (character*1 joba, character*1 jobu, character*1 jobv,character*1 jobr, character*1 jobt, character*1 jobp, integer m,integer n, real, dimension( lda, * ) a, integer lda, real, dimension( n) sva, real, dimension( ldu, * ) u, integer ldu, real, dimension( ldv,* ) v, integer ldv, real, dimension( lwork ) work, integer lwork,integer, dimension( * ) iwork, integer info)
SGEJSV
Purpose:
SGEJSV computes
the singular value decomposition (SVD) of a real M-by-N
matrix [A], where M >= N. The SVD of [A] is written
as
[A] = [U] * [SIGMA] * [V]Λt,
where [SIGMA]
is an N-by-N (M-by-N) matrix which is zero except for its N
diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal
matrix, and
[V] is an N-by-N orthogonal matrix. The diagonal elements of
[SIGMA] are
the singular values of [A]. The columns of [U] and [V] are
the left and
the right singular vectors of [A], respectively. The
matrices [U] and [V]
are computed and stored in the arrays U and V, respectively.
The diagonal
of [SIGMA] is computed and stored in the array SVA.
SGEJSV can sometimes compute tiny singular values and their
singular vectors much
more accurately than other SVD routines, see below under
Further Details.
Parameters
JOBA
JOBA is
CHARACTER*1
Specifies the level of accuracy:
= βCβ: This option works well (high relative
accuracy) if A = B * D,
with well-conditioned B and arbitrary diagonal matrix D.
The accuracy cannot be spoiled by COLUMN scaling. The
accuracy of the computed output depends on the condition of
B, and the procedure aims at the best theoretical accuracy.
The relative error max_{i=1:N}|d sigma_i| / sigma_i is
bounded by f(M,N)*epsilon* cond(B), independent of D.
The input matrix is preprocessed with the QRF with column
pivoting. This initial preprocessing and preconditioning by
a rank revealing QR factorization is common for all values
of
JOBA. Additional actions are specified as follows:
= βEβ: Computation as with βCβ with
an additional estimate of the
condition number of B. It provides a realistic error bound.
= βFβ: If A = D1 * C * D2 with ill-conditioned
diagonal scalings
D1, D2, and well-conditioned matrix C, this option gives
higher accuracy than the βCβ option. If the
structure of the
input matrix is not known, and relative accuracy is
desirable, then this option is advisable. The input matrix A
is preprocessed with QR factorization with FULL (row and
column) pivoting.
= βGβ: Computation as with βFβ with
an additional estimate of the
condition number of B, where A=D*B. If A has heavily
weighted
rows, then using this condition number gives too pessimistic
error bound.
= βAβ: Small singular values are the noise and
the matrix is treated
as numerically rank deficient. The error in the computed
singular values is bounded by f(m,n)*epsilon*||A||.
The computed SVD A = U * S * VΛt restores A up to
f(m,n)*epsilon*||A||.
This gives the procedure the licence to discard (set to
zero)
all singular values below N*epsilon*||A||.
= βRβ: Similar as in βAβ. Rank
revealing property of the initial
QR factorization is used do reveal (using triangular factor)
a gap sigma_{r+1} < epsilon * sigma_r in which case the
numerical RANK is declared to be r. The SVD is computed with
absolute error bounds, but more accurately than with
βAβ.
JOBU
JOBU is
CHARACTER*1
Specifies whether to compute the columns of U:
= βUβ: N columns of U are returned in the array
U.
= βFβ: full set of M left sing. vectors is
returned in the array U.
= βWβ: U may be used as workspace of length M*N.
See the description
of U.
= βNβ: U is not computed.
JOBV
JOBV is
CHARACTER*1
Specifies whether to compute the matrix V:
= βVβ: N columns of V are returned in the array
V; Jacobi rotations
are not explicitly accumulated.
= βJβ: N columns of V are returned in the array
V, but they are
computed as the product of Jacobi rotations. This option is
allowed only if JOBU .NE. βNβ, i.e. in computing
the full SVD.
= βWβ: V may be used as workspace of length N*N.
See the description
of V.
= βNβ: V is not computed.
JOBR
JOBR is
CHARACTER*1
Specifies the RANGE for the singular values. Issues the
licence to
set to zero small positive singular values if they are
outside
specified range. If A .NE. 0 is scaled so that the largest
singular
value of c*A is around SQRT(BIG),
BIG=SLAMCH(βOβ), then JOBR issues
the licence to kill columns of A whose norm in c*A is less
than
SQRT(SFMIN) (for JOBR = βRβ), or less than
SMALL=SFMIN/EPSLN,
where SFMIN=SLAMCH(βSβ),
EPSLN=SLAMCH(βEβ).
= βNβ: Do not kill small columns of c*A. This
option assumes that
BLAS and QR factorizations and triangular solvers are
implemented to work in that range. If the condition of A
is greater than BIG, use SGESVJ.
= βRβ: RESTRICTED range for sigma(c*A) is
[SQRT(SFMIN), SQRT(BIG)]
(roughly, as described above). This option is recommended.
===========================
For computing the singular values in the FULL range
[SFMIN,BIG]
use SGESVJ.
JOBT
JOBT is
CHARACTER*1
If the matrix is square then the procedure may determine to
use
transposed A if AΛt seems to be better with respect to
convergence.
If the matrix is not square, JOBT is ignored. This is
subject to
changes in the future.
The decision is based on two values of entropy over the
adjoint
orbit of AΛt * A. See the descriptions of WORK(6) and
WORK(7).
= βTβ: transpose if entropy test indicates
possibly faster
convergence of Jacobi process if AΛt is taken as input.
If A is
replaced with AΛt, then the row pivoting is included
automatically.
= βNβ: do not speculate.
This option can be used to compute only the singular values,
or the
full SVD (U, SIGMA and V). For only one set of singular
vectors
(U or V), the caller should provide both U and V, as one of
the
matrices is used as workspace if the matrix A is transposed.
The implementer can easily remove this constraint and make
the
code more complicated. See the descriptions of U and V.
JOBP
JOBP is
CHARACTER*1
Issues the licence to introduce structured perturbations to
drown
denormalized numbers. This licence should be active if the
denormals are poorly implemented, causing slow computation,
especially in cases of fast convergence (!). For details see
[1,2].
For the sake of simplicity, this perturbations are included
only
when the full SVD or only the singular values are requested.
The
implementer/user can easily add the perturbation for the
cases of
computing one set of singular vectors.
= βPβ: introduce perturbation
= βNβ: do not perturb
M
M is INTEGER
The number of rows of the input matrix A. M >= 0.
N
N is INTEGER
The number of columns of the input matrix A. M >= N >=
0.
A
A is REAL
array, dimension (LDA,N)
On entry, the M-by-N matrix A.
LDA
LDA is INTEGER
The leading dimension of the array A. LDA >=
max(1,M).
SVA
SVA is REAL
array, dimension (N)
On exit,
- For WORK(1)/WORK(2) = ONE: The singular values of A.
During the
computation SVA contains Euclidean column norms of the
iterated matrices in the array A.
- For WORK(1) .NE. WORK(2): The singular values of A are
(WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
sigma_max(A) overflows or if small singular values have been
saved from underflow by scaling the input matrix A.
- If JOBR=βRβ then some of the singular values
may be returned
as exact zeros obtained by βset to zeroβ because
they are
below the numerical rank threshold or are denormalized
numbers.
U
U is REAL
array, dimension ( LDU, N ) or ( LDU, M )
If JOBU = βUβ, then U contains on exit the
M-by-N matrix of
the left singular vectors.
If JOBU = βFβ, then U contains on exit the
M-by-M matrix of
the left singular vectors, including an ONB
of the orthogonal complement of the Range(A).
If JOBU = βWβ .AND. (JOBV = βVβ
.AND. JOBT = βTβ .AND. M = N),
then U is used as workspace if the procedure
replaces A with AΛt. In that case, [V] is computed
in U as left singular vectors of AΛt and then
copied back to the V array. This βWβ option is
just
a reminder to the caller that in this case U is
reserved as workspace of length N*N.
If JOBU = βNβ U is not referenced, unless
JOBT=βTβ.
LDU
LDU is INTEGER
The leading dimension of the array U, LDU >= 1.
IF JOBU = βUβ or βFβ or
βWβ, then LDU >= M.
V
V is REAL
array, dimension ( LDV, N )
If JOBV = βVβ, βJβ then V contains
on exit the N-by-N matrix of
the right singular vectors;
If JOBV = βWβ, AND (JOBU = βUβ AND
JOBT = βTβ AND M = N),
then V is used as workspace if the procedure
replaces A with AΛt. In that case, [U] is computed
in V as right singular vectors of AΛt and then
copied back to the U array. This βWβ option is
just
a reminder to the caller that in this case V is
reserved as workspace of length N*N.
If JOBV = βNβ V is not referenced, unless
JOBT=βTβ.
LDV
LDV is INTEGER
The leading dimension of the array V, LDV >= 1.
If JOBV = βVβ or βJβ or
βWβ, then LDV >= N.
WORK
WORK is REAL
array, dimension (MAX(7,LWORK))
On exit,
WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor
such
that SCALE*SVA(1:N) are the computed singular values
of A. (See the description of SVA().)
WORK(2) = See the description of WORK(1).
WORK(3) = SCONDA is an estimate for the condition number of
column equilibrated A. (If JOBA = βEβ or
βGβ)
SCONDA is an estimate of SQRT(||(RΛt *
R)Λ(-1)||_1).
It is computed using SPOCON. It holds
NΛ(-1/4) * SCONDA <= ||RΛ(-1)||_2 <=
NΛ(1/4) * SCONDA
where R is the triangular factor from the QRF of A.
However, if R is truncated and the numerical rank is
determined to be strictly smaller than N, SCONDA is
returned as -1, thus indicating that the smallest
singular values might be lost.
If full SVD is
needed, the following two condition numbers are
useful for the analysis of the algorithm. They are provided
for
a developer/implementer who is familiar with the details of
the method.
WORK(4) = an
estimate of the scaled condition number of the
triangular factor in the first QR factorization.
WORK(5) = an estimate of the scaled condition number of the
triangular factor in the second QR factorization.
The following two parameters are computed if JOBT =
βTβ.
They are provided for a developer/implementer who is
familiar
with the details of the method.
WORK(6) = the
entropy of AΛt*A :: this is the Shannon entropy
of diag(AΛt*A) / Trace(AΛt*A) taken as point in
the
probability simplex.
WORK(7) = the entropy of A*AΛt.
LWORK
LWORK is
INTEGER
Length of WORK to confirm proper allocation of work space.
LWORK depends on the job:
If only SIGMA
is needed ( JOBU = βNβ, JOBV = βNβ )
and
-> .. no scaled condition estimate required (JOBE =
βNβ):
LWORK >= max(2*M+N,4*N+1,7). This is the minimal
requirement.
->> For optimal performance (blocked code) the optimal
value
is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the
optimal
block size for SGEQP3 and SGEQRF.
In general, optimal LWORK is computed as
LWORK >= max(2*M+N,N+LWORK(SGEQP3),N+LWORK(SGEQRF), 7).
-> .. an estimate of the scaled condition number of A is
required (JOBA=βEβ, βGβ). In this
case, LWORK is the maximum
of the above and N*N+4*N, i.e. LWORK >=
max(2*M+N,N*N+4*N,7).
->> For optimal performance (blocked code) the optimal
value
is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
In general, the optimal length LWORK is computed as
LWORK >= max(2*M+N,N+LWORK(SGEQP3),N+LWORK(SGEQRF),
N+N*N+LWORK(SPOCON),7).
If SIGMA and
the right singular vectors are needed (JOBV =
βVβ),
-> the minimal requirement is LWORK >=
max(2*M+N,4*N+1,7).
-> For optimal performance, LWORK >=
max(2*M+N,3*N+(N+1)*NB,7),
where NB is the optimal block size for SGEQP3, SGEQRF,
SGELQF,
SORMLQ. In general, the optimal length LWORK is computed as
LWORK >= max(2*M+N,N+LWORK(SGEQP3), N+LWORK(SPOCON),
N+LWORK(SGELQF), 2*N+LWORK(SGEQRF), N+LWORK(SORMLQ)).
If SIGMA and
the left singular vectors are needed
-> the minimal requirement is LWORK >=
max(2*M+N,4*N+1,7).
-> For optimal performance:
if JOBU = βUβ :: LWORK >=
max(2*M+N,3*N+(N+1)*NB,7),
if JOBU = βFβ :: LWORK >=
max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
where NB is the optimal block size for SGEQP3, SGEQRF,
SORMQR.
In general, the optimal length LWORK is computed as
LWORK >= max(2*M+N,N+LWORK(SGEQP3),N+LWORK(SPOCON),
2*N+LWORK(SGEQRF), N+LWORK(SORMQR)).
Here LWORK(SORMQR) equals N*NB (for JOBU = βUβ)
or
M*NB (for JOBU = βFβ).
If the full SVD
is needed: (JOBU = βUβ or JOBU =
βFβ) and
-> if JOBV = βVβ
the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).
-> if JOBV = βJβ the minimal requirement is
LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
-> For optimal performance, LWORK should be additionally
larger than N+M*NB, where NB is the optimal block size
for SORMQR.
IWORK
IWORK is
INTEGER array, dimension (MAX(3,M+3*N)).
On exit,
IWORK(1) = the numerical rank determined after the initial
QR factorization with pivoting. See the descriptions
of JOBA and JOBR.
IWORK(2) = the number of the computed nonzero singular
values
IWORK(3) = if nonzero, a warning message:
If IWORK(3) = 1 then some of the column norms of A
were denormalized floats. The requested high accuracy
is not warranted by the data.
INFO
INFO is INTEGER
< 0: if INFO = -i, then the i-th argument had an illegal
value.
= 0: successful exit;
> 0: SGEJSV did not converge in the maximal allowed
number
of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
SGEJSV
implements a preconditioned Jacobi SVD algorithm. It uses
SGEQP3,
SGEQRF, and SGELQF as preprocessors and preconditioners.
Optionally, an
additional row pivoting can be used as a preprocessor, which
in some
cases results in much higher accuracy. An example is matrix
A with the
structure A = D1 * C * D2, where D1, D2 are arbitrarily
ill-conditioned
diagonal matrices and C is well-conditioned matrix. In that
case, complete
pivoting in the first QR factorizations provides accuracy
dependent on the
condition number of C, and independent of D1, D2. Such
higher accuracy is
not completely understood theoretically, but it works well
in practice.
Further, if A can be written as A = B*D, with
well-conditioned B and some
diagonal D, then the high accuracy is guaranteed, both
theoretically and
in software, independent of D. For more details see [1],
[2].
The computational range for the singular values can be the
full range
( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic
and the BLAS
& LAPACK routines called by SGEJSV are implemented to
work in that range.
If that is not the case, then the restriction for safe
computation with
the singular values in the range of normalized IEEE numbers
is that the
spectral condition number kappa(A)=sigma_max(A)/sigma_min(A)
does not
overflow. This code (SGEJSV) is best used in this restricted
range,
meaning that singular values of magnitude below ||A||_2 /
SLAMCH(βOβ) are
returned as zeros. See JOBR for details on this.
Further, this implementation is somewhat slower than the one
described
in [1,2] due to replacement of some non-LAPACK components,
and because
the choice of some tuning parameters in the iterative part
(SGESVJ) is
left to the implementer on a particular machine.
The rank revealing QR factorization (in this code: SGEQP3)
should be
implemented as in [3]. We have a new version of SGEQP3 under
development
that is more robust than the current one in LAPACK, with a
cleaner cut in
rank deficient cases. It will be available in the SIGMA
library [4].
If M is much larger than N, it is obvious that the initial
QRF with
column pivoting can be preprocessed by the QRF without
pivoting. That
well known trick is not used in SGEJSV because in some cases
heavy row
weighting can be treated with complete pivoting. The
overhead in cases
M much larger than N is then only due to pivoting, but the
benefits in
terms of accuracy have prevailed. The implementer/user can
incorporate
this extra QRF step easily. The implementer can also improve
data movement
(matrix transpose, matrix copy, matrix transposed copy) -
this
implementation of SGEJSV uses only the simplest, naive data
movement.
Contributors:
Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
References:
[1] Z. Drmac
and K. Veselic: New fast and accurate Jacobi SVD algorithm
I.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp.
1322-1342.
LAPACK Working note 169.
[2] Z. Drmac and K. Veselic: New fast and accurate Jacobi
SVD algorithm II.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp.
1343-1362.
LAPACK Working note 170.
[3] Z. Drmac and Z. Bujanovic: On the failure of
rank-revealing QR
factorization software - a case study.
ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
LAPACK Working note 176.
[4] Z. Drmac: SIGMA - mathematical software library for
accurate SVD, PSV,
QSVD, (H,K)-SVD computations.
Department of Mathematics, University of Zagreb, 2008.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac@math.hr. Thank you.
subroutine zgejsv (character*1 joba, character*1 jobu, character*1 jobv,character*1 jobr, character*1 jobt, character*1 jobp, integer m,integer n, complex*16, dimension( lda, * ) a, integer lda, doubleprecision, dimension( n ) sva, complex*16, dimension( ldu, * ) u,integer ldu, complex*16, dimension( ldv, * ) v, integer ldv,complex*16, dimension( lwork ) cwork, integer lwork, double precision,dimension( lrwork ) rwork, integer lrwork, integer, dimension( * )iwork, integer info)
ZGEJSV
Purpose:
ZGEJSV computes
the singular value decomposition (SVD) of a complex M-by-N
matrix [A], where M >= N. The SVD of [A] is written
as
[A] = [U] * [SIGMA] * [V]Λ*,
where [SIGMA]
is an N-by-N (M-by-N) matrix which is zero except for its N
diagonal elements, [U] is an M-by-N (or M-by-M) unitary
matrix, and
[V] is an N-by-N unitary matrix. The diagonal elements of
[SIGMA] are
the singular values of [A]. The columns of [U] and [V] are
the left and
the right singular vectors of [A], respectively. The
matrices [U] and [V]
are computed and stored in the arrays U and V, respectively.
The diagonal
of [SIGMA] is computed and stored in the array SVA.
Parameters
JOBA
JOBA is
CHARACTER*1
Specifies the level of accuracy:
= βCβ: This option works well (high relative
accuracy) if A = B * D,
with well-conditioned B and arbitrary diagonal matrix D.
The accuracy cannot be spoiled by COLUMN scaling. The
accuracy of the computed output depends on the condition of
B, and the procedure aims at the best theoretical accuracy.
The relative error max_{i=1:N}|d sigma_i| / sigma_i is
bounded by f(M,N)*epsilon* cond(B), independent of D.
The input matrix is preprocessed with the QRF with column
pivoting. This initial preprocessing and preconditioning by
a rank revealing QR factorization is common for all values
of
JOBA. Additional actions are specified as follows:
= βEβ: Computation as with βCβ with
an additional estimate of the
condition number of B. It provides a realistic error bound.
= βFβ: If A = D1 * C * D2 with ill-conditioned
diagonal scalings
D1, D2, and well-conditioned matrix C, this option gives
higher accuracy than the βCβ option. If the
structure of the
input matrix is not known, and relative accuracy is
desirable, then this option is advisable. The input matrix A
is preprocessed with QR factorization with FULL (row and
column) pivoting.
= βGβ: Computation as with βFβ with
an additional estimate of the
condition number of B, where A=B*D. If A has heavily
weighted
rows, then using this condition number gives too pessimistic
error bound.
= βAβ: Small singular values are not well
determined by the data
and are considered as noisy; the matrix is treated as
numerically rank deficient. The error in the computed
singular values is bounded by f(m,n)*epsilon*||A||.
The computed SVD A = U * S * VΛ* restores A up to
f(m,n)*epsilon*||A||.
This gives the procedure the licence to discard (set to
zero)
all singular values below N*epsilon*||A||.
= βRβ: Similar as in βAβ. Rank
revealing property of the initial
QR factorization is used do reveal (using triangular factor)
a gap sigma_{r+1} < epsilon * sigma_r in which case the
numerical RANK is declared to be r. The SVD is computed with
absolute error bounds, but more accurately than with
βAβ.
JOBU
JOBU is
CHARACTER*1
Specifies whether to compute the columns of U:
= βUβ: N columns of U are returned in the array
U.
= βFβ: full set of M left sing. vectors is
returned in the array U.
= βWβ: U may be used as workspace of length M*N.
See the description
of U.
= βNβ: U is not computed.
JOBV
JOBV is
CHARACTER*1
Specifies whether to compute the matrix V:
= βVβ: N columns of V are returned in the array
V; Jacobi rotations
are not explicitly accumulated.
= βJβ: N columns of V are returned in the array
V, but they are
computed as the product of Jacobi rotations, if JOBT =
βNβ.
= βWβ: V may be used as workspace of length N*N.
See the description
of V.
= βNβ: V is not computed.
JOBR
JOBR is
CHARACTER*1
Specifies the RANGE for the singular values. Issues the
licence to
set to zero small positive singular values if they are
outside
specified range. If A .NE. 0 is scaled so that the largest
singular
value of c*A is around SQRT(BIG),
BIG=DLAMCH(βOβ), then JOBR issues
the licence to kill columns of A whose norm in c*A is less
than
SQRT(SFMIN) (for JOBR = βRβ), or less than
SMALL=SFMIN/EPSLN,
where SFMIN=DLAMCH(βSβ),
EPSLN=DLAMCH(βEβ).
= βNβ: Do not kill small columns of c*A. This
option assumes that
BLAS and QR factorizations and triangular solvers are
implemented to work in that range. If the condition of A
is greater than BIG, use ZGESVJ.
= βRβ: RESTRICTED range for sigma(c*A) is
[SQRT(SFMIN), SQRT(BIG)]
(roughly, as described above). This option is recommended.
===========================
For computing the singular values in the FULL range
[SFMIN,BIG]
use ZGESVJ.
JOBT
JOBT is
CHARACTER*1
If the matrix is square then the procedure may determine to
use
transposed A if AΛ* seems to be better with respect to
convergence.
If the matrix is not square, JOBT is ignored.
The decision is based on two values of entropy over the
adjoint
orbit of AΛ* * A. See the descriptions of RWORK(6) and
RWORK(7).
= βTβ: transpose if entropy test indicates
possibly faster
convergence of Jacobi process if AΛ* is taken as input.
If A is
replaced with AΛ*, then the row pivoting is included
automatically.
= βNβ: do not speculate.
The option βTβ can be used to compute only the
singular values, or
the full SVD (U, SIGMA and V). For only one set of singular
vectors
(U or V), the caller should provide both U and V, as one of
the
matrices is used as workspace if the matrix A is transposed.
The implementer can easily remove this constraint and make
the
code more complicated. See the descriptions of U and V.
In general, this option is considered experimental, and
βNβ; should
be preferred. This is subject to changes in the future.
JOBP
JOBP is
CHARACTER*1
Issues the licence to introduce structured perturbations to
drown
denormalized numbers. This licence should be active if the
denormals are poorly implemented, causing slow computation,
especially in cases of fast convergence (!). For details see
[1,2].
For the sake of simplicity, this perturbations are included
only
when the full SVD or only the singular values are requested.
The
implementer/user can easily add the perturbation for the
cases of
computing one set of singular vectors.
= βPβ: introduce perturbation
= βNβ: do not perturb
M
M is INTEGER
The number of rows of the input matrix A. M >= 0.
N
N is INTEGER
The number of columns of the input matrix A. M >= N >=
0.
A
A is COMPLEX*16
array, dimension (LDA,N)
On entry, the M-by-N matrix A.
LDA
LDA is INTEGER
The leading dimension of the array A. LDA >=
max(1,M).
SVA
SVA is DOUBLE
PRECISION array, dimension (N)
On exit,
- For RWORK(1)/RWORK(2) = ONE: The singular values of A.
During
the computation SVA contains Euclidean column norms of the
iterated matrices in the array A.
- For RWORK(1) .NE. RWORK(2): The singular values of A are
(RWORK(1)/RWORK(2)) * SVA(1:N). This factored form is used
if
sigma_max(A) overflows or if small singular values have been
saved from underflow by scaling the input matrix A.
- If JOBR=βRβ then some of the singular values
may be returned
as exact zeros obtained by βset to zeroβ because
they are
below the numerical rank threshold or are denormalized
numbers.
U
U is COMPLEX*16
array, dimension ( LDU, N )
If JOBU = βUβ, then U contains on exit the
M-by-N matrix of
the left singular vectors.
If JOBU = βFβ, then U contains on exit the
M-by-M matrix of
the left singular vectors, including an ONB
of the orthogonal complement of the Range(A).
If JOBU = βWβ .AND. (JOBV = βVβ
.AND. JOBT = βTβ .AND. M = N),
then U is used as workspace if the procedure
replaces A with AΛ*. In that case, [V] is computed
in U as left singular vectors of AΛ* and then
copied back to the V array. This βWβ option is
just
a reminder to the caller that in this case U is
reserved as workspace of length N*N.
If JOBU = βNβ U is not referenced, unless
JOBT=βTβ.
LDU
LDU is INTEGER
The leading dimension of the array U, LDU >= 1.
IF JOBU = βUβ or βFβ or
βWβ, then LDU >= M.
V
V is COMPLEX*16
array, dimension ( LDV, N )
If JOBV = βVβ, βJβ then V contains
on exit the N-by-N matrix of
the right singular vectors;
If JOBV = βWβ, AND (JOBU = βUβ AND
JOBT = βTβ AND M = N),
then V is used as workspace if the procedure
replaces A with AΛ*. In that case, [U] is computed
in V as right singular vectors of AΛ* and then
copied back to the U array. This βWβ option is
just
a reminder to the caller that in this case V is
reserved as workspace of length N*N.
If JOBV = βNβ V is not referenced, unless
JOBT=βTβ.
LDV
LDV is INTEGER
The leading dimension of the array V, LDV >= 1.
If JOBV = βVβ or βJβ or
βWβ, then LDV >= N.
CWORK
CWORK is
COMPLEX*16 array, dimension (MAX(2,LWORK))
If the call to ZGEJSV is a workspace query (indicated by
LWORK=-1 or
LRWORK=-1), then on exit CWORK(1) contains the required
length of
CWORK for the job parameters used in the call.
LWORK
LWORK is
INTEGER
Length of CWORK to confirm proper allocation of workspace.
LWORK depends on the job:
1. If only
SIGMA is needed ( JOBU = βNβ, JOBV =
βNβ ) and
1.1 .. no scaled condition estimate required
(JOBA.NE.βEβ.AND.JOBA.NE.βGβ):
LWORK >= 2*N+1. This is the minimal requirement.
->> For optimal performance (blocked code) the optimal
value
is LWORK >= N + (N+1)*NB. Here NB is the optimal
block size for ZGEQP3 and ZGEQRF.
In general, optimal LWORK is computed as
LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF),
LWORK(ZGESVJ)).
1.2. .. an estimate of the scaled condition number of A is
required (JOBA=βEβ, or βGβ). In this
case, LWORK the minimal
requirement is LWORK >= N*N + 2*N.
->> For optimal performance (blocked code) the optimal
value
is LWORK >= max(N+(N+1)*NB, N*N+2*N)=N**2+2*N.
In general, the optimal length LWORK is computed as
LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF),
LWORK(ZGESVJ),
N*N+LWORK(ZPOCON)).
2. If SIGMA and the right singular vectors are needed (JOBV
= βVβ),
(JOBU = βNβ)
2.1 .. no scaled condition estimate requested (JOBE =
βNβ):
-> the minimal requirement is LWORK >= 3*N.
-> For optimal performance,
LWORK >= max(N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
where NB is the optimal block size for ZGEQP3, ZGEQRF,
ZGELQF,
ZUNMLQ. In general, the optimal length LWORK is computed as
LWORK >= max(N+LWORK(ZGEQP3), N+LWORK(ZGESVJ),
N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)).
2.2 .. an estimate of the scaled condition number of A is
required (JOBA=βEβ, or βGβ).
-> the minimal requirement is LWORK >= 3*N.
-> For optimal performance,
LWORK >= max(N+(N+1)*NB, 2*N,2*N+N*NB)=2*N+N*NB,
where NB is the optimal block size for ZGEQP3, ZGEQRF,
ZGELQF,
ZUNMLQ. In general, the optimal length LWORK is computed as
LWORK >= max(N+LWORK(ZGEQP3), LWORK(ZPOCON),
N+LWORK(ZGESVJ),
N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)).
3. If SIGMA and the left singular vectors are needed
3.1 .. no scaled condition estimate requested (JOBE =
βNβ):
-> the minimal requirement is LWORK >= 3*N.
-> For optimal performance:
if JOBU = βUβ :: LWORK >= max(3*N,
N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
where NB is the optimal block size for ZGEQP3, ZGEQRF,
ZUNMQR.
In general, the optimal length LWORK is computed as
LWORK >= max(N+LWORK(ZGEQP3), 2*N+LWORK(ZGEQRF),
N+LWORK(ZUNMQR)).
3.2 .. an estimate of the scaled condition number of A is
required (JOBA=βEβ, or βGβ).
-> the minimal requirement is LWORK >= 3*N.
-> For optimal performance:
if JOBU = βUβ :: LWORK >= max(3*N,
N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB,
where NB is the optimal block size for ZGEQP3, ZGEQRF,
ZUNMQR.
In general, the optimal length LWORK is computed as
LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZPOCON),
2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)).
4. If the full SVD is needed: (JOBU = βUβ or
JOBU = βFβ) and
4.1. if JOBV = βVβ
the minimal requirement is LWORK >= 5*N+2*N*N.
4.2. if JOBV = βJβ the minimal requirement is
LWORK >= 4*N+N*N.
In both cases, the allocated CWORK can accommodate blocked
runs
of ZGEQP3, ZGEQRF, ZGELQF, SUNMQR, ZUNMLQ.
If the call to
ZGEJSV is a workspace query (indicated by LWORK=-1 or
LRWORK=-1), then on exit CWORK(1) contains the optimal and
CWORK(2) contains the
minimal length of CWORK for the job parameters used in the
call.
RWORK
RWORK is DOUBLE
PRECISION array, dimension (MAX(7,LRWORK))
On exit,
RWORK(1) = Determines the scaling factor SCALE = RWORK(2) /
RWORK(1)
such that SCALE*SVA(1:N) are the computed singular values
of A. (See the description of SVA().)
RWORK(2) = See the description of RWORK(1).
RWORK(3) = SCONDA is an estimate for the condition number of
column equilibrated A. (If JOBA = βEβ or
βGβ)
SCONDA is an estimate of SQRT(||(RΛ* *
R)Λ(-1)||_1).
It is computed using ZPOCON. It holds
NΛ(-1/4) * SCONDA <= ||RΛ(-1)||_2 <=
NΛ(1/4) * SCONDA
where R is the triangular factor from the QRF of A.
However, if R is truncated and the numerical rank is
determined to be strictly smaller than N, SCONDA is
returned as -1, thus indicating that the smallest
singular values might be lost.
If full SVD is
needed, the following two condition numbers are
useful for the analysis of the algorithm. They are provided
for
a developer/implementer who is familiar with the details of
the method.
RWORK(4) = an
estimate of the scaled condition number of the
triangular factor in the first QR factorization.
RWORK(5) = an estimate of the scaled condition number of the
triangular factor in the second QR factorization.
The following two parameters are computed if JOBT =
βTβ.
They are provided for a developer/implementer who is
familiar
with the details of the method.
RWORK(6) = the entropy of AΛ* * A :: this is the
Shannon entropy
of diag(AΛ* * A) / Trace(AΛ* * A) taken as point
in the
probability simplex.
RWORK(7) = the entropy of A * AΛ*. (See the description
of RWORK(6).)
If the call to ZGEJSV is a workspace query (indicated by
LWORK=-1 or
LRWORK=-1), then on exit RWORK(1) contains the required
length of
RWORK for the job parameters used in the call.
LRWORK
LRWORK is
INTEGER
Length of RWORK to confirm proper allocation of workspace.
LRWORK depends on the job:
1. If only the
singular values are requested i.e. if
LSAME(JOBU,βNβ) .AND.
LSAME(JOBV,βNβ)
then:
1.1. If LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR.
LSAME(JOBA,βGβ),
then: LRWORK = max( 7, 2 * M ).
1.2. Otherwise, LRWORK = max( 7, N ).
2. If singular values with the right singular vectors are
requested
i.e. if
(LSAME(JOBV,βVβ).OR.LSAME(JOBV,βJβ))
.AND.
.NOT.(LSAME(JOBU,βUβ).OR.LSAME(JOBU,βFβ))
then:
2.1. If LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR.
LSAME(JOBA,βGβ),
then LRWORK = max( 7, 2 * M ).
2.2. Otherwise, LRWORK = max( 7, N ).
3. If singular values with the left singular vectors are
requested, i.e. if
(LSAME(JOBU,βUβ).OR.LSAME(JOBU,βFβ))
.AND.
.NOT.(LSAME(JOBV,βVβ).OR.LSAME(JOBV,βJβ))
then:
3.1. If LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR.
LSAME(JOBA,βGβ),
then LRWORK = max( 7, 2 * M ).
3.2. Otherwise, LRWORK = max( 7, N ).
4. If singular values with both the left and the right
singular vectors
are requested, i.e. if
(LSAME(JOBU,βUβ).OR.LSAME(JOBU,βFβ))
.AND.
(LSAME(JOBV,βVβ).OR.LSAME(JOBV,βJβ))
then:
4.1. If LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR.
LSAME(JOBA,βGβ),
then LRWORK = max( 7, 2 * M ).
4.2. Otherwise, LRWORK = max( 7, N ).
If, on entry,
LRWORK = -1 or LWORK=-1, a workspace query is assumed and
the length of RWORK is returned in RWORK(1).
IWORK
IWORK is
INTEGER array, of dimension at least 4, that further depends
on the job:
1. If only the
singular values are requested then:
If ( LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR. LSAME(JOBA,βGβ)
)
then the length of IWORK is N+M; otherwise the length of
IWORK is N.
2. If the singular values and the right singular vectors are
requested then:
If ( LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR. LSAME(JOBA,βGβ)
)
then the length of IWORK is N+M; otherwise the length of
IWORK is N.
3. If the singular values and the left singular vectors are
requested then:
If ( LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR. LSAME(JOBA,βGβ)
)
then the length of IWORK is N+M; otherwise the length of
IWORK is N.
4. If the singular values with both the left and the right
singular vectors
are requested, then:
4.1. If LSAME(JOBV,βJβ) the length of IWORK is
determined as follows:
If ( LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR. LSAME(JOBA,βGβ)
)
then the length of IWORK is N+M; otherwise the length of
IWORK is N.
4.2. If LSAME(JOBV,βVβ) the length of IWORK is
determined as follows:
If ( LSAME(JOBT,βTβ) .OR.
LSAME(JOBA,βFβ) .OR. LSAME(JOBA,βGβ)
)
then the length of IWORK is 2*N+M; otherwise the length of
IWORK is 2*N.
On exit,
IWORK(1) = the numerical rank determined after the initial
QR factorization with pivoting. See the descriptions
of JOBA and JOBR.
IWORK(2) = the number of the computed nonzero singular
values
IWORK(3) = if nonzero, a warning message:
If IWORK(3) = 1 then some of the column norms of A
were denormalized floats. The requested high accuracy
is not warranted by the data.
IWORK(4) = 1 or -1. If IWORK(4) = 1, then the procedure used
AΛ* to
do the job as specified by the JOB parameters.
If the call to ZGEJSV is a workspace query (indicated by
LWORK = -1 or
LRWORK = -1), then on exit IWORK(1) contains the required
length of
IWORK for the job parameters used in the call.
INFO
INFO is INTEGER
< 0: if INFO = -i, then the i-th argument had an illegal
value.
= 0: successful exit;
> 0: ZGEJSV did not converge in the maximal allowed
number
of sweeps. The computed values may be inaccurate.
Author
Univ. of Tennessee
Univ. of California Berkeley
Univ. of Colorado Denver
NAG Ltd.
Further Details:
ZGEJSV
implements a preconditioned Jacobi SVD algorithm. It uses
ZGEQP3,
ZGEQRF, and ZGELQF as preprocessors and preconditioners.
Optionally, an
additional row pivoting can be used as a preprocessor, which
in some
cases results in much higher accuracy. An example is matrix
A with the
structure A = D1 * C * D2, where D1, D2 are arbitrarily
ill-conditioned
diagonal matrices and C is well-conditioned matrix. In that
case, complete
pivoting in the first QR factorizations provides accuracy
dependent on the
condition number of C, and independent of D1, D2. Such
higher accuracy is
not completely understood theoretically, but it works well
in practice.
Further, if A can be written as A = B*D, with
well-conditioned B and some
diagonal D, then the high accuracy is guaranteed, both
theoretically and
in software, independent of D. For more details see [1],
[2].
The computational range for the singular values can be the
full range
( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic
and the BLAS
& LAPACK routines called by ZGEJSV are implemented to
work in that range.
If that is not the case, then the restriction for safe
computation with
the singular values in the range of normalized IEEE numbers
is that the
spectral condition number kappa(A)=sigma_max(A)/sigma_min(A)
does not
overflow. This code (ZGEJSV) is best used in this restricted
range,
meaning that singular values of magnitude below ||A||_2 /
DLAMCH(βOβ) are
returned as zeros. See JOBR for details on this.
Further, this implementation is somewhat slower than the one
described
in [1,2] due to replacement of some non-LAPACK components,
and because
the choice of some tuning parameters in the iterative part
(ZGESVJ) is
left to the implementer on a particular machine.
The rank revealing QR factorization (in this code: ZGEQP3)
should be
implemented as in [3]. We have a new version of ZGEQP3 under
development
that is more robust than the current one in LAPACK, with a
cleaner cut in
rank deficient cases. It will be available in the SIGMA
library [4].
If M is much larger than N, it is obvious that the initial
QRF with
column pivoting can be preprocessed by the QRF without
pivoting. That
well known trick is not used in ZGEJSV because in some cases
heavy row
weighting can be treated with complete pivoting. The
overhead in cases
M much larger than N is then only due to pivoting, but the
benefits in
terms of accuracy have prevailed. The implementer/user can
incorporate
this extra QRF step easily. The implementer can also improve
data movement
(matrix transpose, matrix copy, matrix transposed copy) -
this
implementation of ZGEJSV uses only the simplest, naive data
movement.
Contributor:
Zlatko Drmac, Department of Mathematics, Faculty of Science, University of Zagreb (Zagreb, Croatia); drmac@math.hr
References:
[1] Z. Drmac
and K. Veselic: New fast and accurate Jacobi SVD algorithm
I.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp.
1322-1342.
LAPACK Working note 169.
[2] Z. Drmac and K. Veselic: New fast and accurate Jacobi
SVD algorithm II.
SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp.
1343-1362.
LAPACK Working note 170.
[3] Z. Drmac and Z. Bujanovic: On the failure of
rank-revealing QR
factorization software - a case study.
ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
LAPACK Working note 176.
[4] Z. Drmac: SIGMA - mathematical software library for
accurate SVD, PSV,
QSVD, (H,K)-SVD computations.
Department of Mathematics, University of Zagreb, 2008,
2016.
Bugs, examples and comments:
Please report all bugs and send interesting examples and/or comments to drmac@math.hr. Thank you.
Author
Generated automatically by Doxygen for LAPACK from the source code.