#Experimental Math Spring 2019 Rutgers University
#Victoria Chayes
#5/12/2019
#Contribution to Class Project

Help:=proc(): print(`NLTVPD(DATA,n,CWMParam,FParam,PDHGParam,full): Does non-local total variation using the primal dual gradient descent algorithm to sort data.`):
print(`Inputs:`):
print(`DATA: the list of data to be sorted, given as a list of lists.`):
print(`n: the number of clusters for the data to be sorted into.`):
print(`CWMparam= a list of parameters [e,c], for the calculation of the weight matrix (a measure of how close each datapoint is to the other datapoints.) These parameters will influence the overall final clustering, ie changing these parameters changes the output.`):
print(`---e: a positive number determining the weight given to the standard Euclidean distance as a measure of the distance between the vectors. Set e=1 and c=0 to construct a weight matrix solely using Euclidean distance.`):
print(`---c: a positive number determining the weight given to the cosine similarity as a measure of the distance between the vectors. Set c=1 and e=0 to construct a weight matrix solely using cosine similarity.`):
print(`FParam= a list of parameters [e,c,l], for the calculation of the fidelity term of the energy functional. These parameters will influence the overall final clustering, ie changing these parameters changes the output.`):
print(`---e,c: the same weights for Euclidean vs cosine distance as above. It is recommended that the same e be chosen in FParam and CWMParam.`):
print(`---l: a positive number determining the weight in the energy functional given to the fidelity term.`):
print(`PDHGParam= a list of parameters [theta,sigma,tau,tol,innerloop,outerloop]. These parameters will influence the rate of convergence, but not the final clustering, ie changing these parameters changes how long the program runs but not the output.`):
print(`---theta, sigma, tau: three strictly positive numbers satisfying 0<theta<=1, sigma*tau<=1.`):
print(`---tol: a non-negative number the stopping tolerance. when the difference between the previous cluster assignment function and the current cluster assignment function is below tol, the program will terminate. this difference is calculated by the greatest difference between the previous average vector of each sorted cluster, and the current average vectors of each sorted cluster.`):
print(`--innerloop: a positive integer indicating the number of times the primal dual algorithm runs in a loop to minimize the assignment function with respect to a set of average vectors of the clusters before calculating new clusters.`):
print(`--outerloop: not used. comment out 'while d>tol do' and un-comment 'for z from 1 to outerloop do' to have the stopping criterion be a set number of iterations instead of for changes in the assignment function to be less than a certain tolerance. `):
print(`full: set to 1 to receive the datapoints directly sorted into n clusters as an output. set to 0 (or any other value besides 1) to receive the final Uhard clustering assignment matrix as the output.`):
print(`Outputs:`):
print(`The datapoints in a list of n lists of lists, with each of the n lists of lists indicating a sorted cluster of data OR the final Uhard clustering assignment matrix as a list of lists.`):
print(`Try:`):
print(`NLTVPD(ExtractFields(MR,[3,4,5,6,7,8,9]),4,[1,0],[1,0,10^6],[1,2.5,.01,0,10,25],1)`):
print(`Call HelpAux() to see all Auxiliary functions.`):
end:

HelpAux:=proc(): print(`AverageVector(M), Comp(n,k), ConstructWeightMatrix(DATA,e,c),ConvertHard(i,n), CountInstances(temp,k), DumbThreshhold(U), Fidelity(g,C,e,c), FindFlatIndex(temp), GreaterThanEq(temp,a), NewCentroids(SDATA), NoDoubleCounting(temp), NormalizeData(DATA), Norms(DATA), NonlocalDivergence(W,v), NonlocalGradient(W,U), PDInner(u,f),PermuteList(L,P),ProjectEdge(M), RandomInitialCentroids(DATA,n), SetAToB(M,a,b), SimplexGrid(n,s), SortData(DATA,Uhard), Threshhold(U,s,eta,y)`): end:


#################################################################################
#################################################################################

#NLTVPD(DATA,n,CWMParam,FParam,PDHGParam,full): Does non-local total variation using the primal dual gradient descent algorithm to sort data.
#Inputs:
#DATA: the list of data to be sorted, given as a list of lists.
#n: the number of clusters for the data to be sorted into.
#CWMparam= a list of parameters [e,c], for the calculation of the weight matrix (a measure of how close each datapoint is to the other datapoints.) These parameters will influence the overall final clustering, ie changing these parameters changes the output.
#   e: a positive number determining the weight given to the standard Euclidean distance as a measure of the distance between the vectors. Set e=1 and c=0 to construct a weight matrix solely using Euclidean distance.
#   c: a positive number determining the weight given to the cosine similarity as a measure of the distance between the vectors. Set c=1 and e=0 to construct a weight matrix solely using cosine similarity.
#FParam= a list of parameters [e,c,l], for the calculation of the fidelity term of the energy functional. These parameters will influence the overall final clustering, ie changing these parameters changes the output.
#   e: a positive number determining the weight given to the standard Euclidean distance as a measure of the distance between the vectors. Set e=1 and c=0 to construct the fidelity term solely using Euclidean distance. It is recommended that the same e be chosen in FParam and CWMParam.
#   c: a positive number determining the weight given to the cosine similarity as a measure of the distance between the vectors. Set c=1 and e=0 to construct the fidelity term solely using cosine similarity. It is recommended that the same c be chosen in FParam and CWMParam.
#   l: a positive number determining the weight in the energy functional given to the fidelity term.
#PDHGParam= a list of parameters [theta,sigma,tau,tol,innerloop,outerloop]. These parameters will influence the rate of convergence, but not the final clustering, ie changing these parameters changes how long the program runs but not the output.
#   theta, sigma, tau: three strictly positive numbers satisfying 0<theta<=1, sigma*tau<=1. 
#   tol: a non-negative number the stopping tolerance. when the difference between the previous cluster assignment function and the current cluster assignment function is below tol, the program will terminate. this difference is calculated by the greatest difference between the previous average vector of each sorted cluster, and the current average vectors of each sorted cluster.
#   innerloop: a positive integer indicating the number of times the primal dual algorithm runs in a loop to minimize the assignment function with respect to a set of average vectors of the clusters before calculating new clusters.  
#   outerloop: not used. comment out `while d>tol do` and un-comment `for z from 1 to outerloop do` to have the stopping criterion be a set number of iterations instead of for changes in the assignment function to be less than a certain tolerance. 
#full: set to 1 to receive the datapoints directly sorted into n clusters as an output. set to 0 (or any other value besides 1) to receive the final Uhard clustering assignment matrix as the output.
#Outputs:
#The datapoints in a list of n lists of lists, with each of the n lists of lists indicating a sorted cluster of data OR the final Uhard clustering assignment matrix as a list of lists.

NLTVPD:=proc(DATA,n,CWMParam,FParam,PDHGParam,full) local m1, m2, W, C, theta, opnorm, sigma, tau, tol, d, innerloop, U, Ubar, P, F, i, j, k, innercount, Uhard, hope, rec, temp, tempF, SDATA, COld, Uold, z, outerloop:
m1:=nops(DATA):
m2:=nops(DATA[1]):
C:=RandomInitialCentroids(DATA,n):
W:=NormalizeData(ConstructWeightMatrix(DATA,CWMParam[1],CWMParam[2])):

theta:=PDHGParam[1]:
opnorm:=max(W):
sigma:=PDHGParam[2]/opnorm:
tau:=PDHGParam[3]/opnorm:
tol:=PDHGParam[4]:
d:=2*tol+1:
innerloop:=PDHGParam[5]:
outerloop:=PDGHParam[6]:

U:=[seq([seq(1/m1,i=1..n)],j=1..m1)]:
Ubar:=U:
P:=[[[0$m1]$m1]$n]:

while d>tol do
#for z from 1 to outerloop do
F:=[]:
for i from 1 to m1 do
  F:=[op(F),Fidelity(DATA[i],C,FParam[1],FParam[2],FParam[3])]:
od:
F:=tau*F+[[1$n]$m1]:
innercount:=0:
while innercount<innerloop do
  Uold:=U:
  P:=P+sigma*~NonlocalGradient(W, Ubar): 
  for k from 1 to n do
    hope:=1:
    for i from 1 to m1 do
      rec:=sqrt(add(P[k][i][j]^2,j=1..m1)):
      if rec>hope then
        hope:=rec:
      fi:
    od:
    P[k]:=P[k]/hope:
  od:
  temp:=ListTools[Transpose](U):
  tempF:=ListTools[Transpose](F):
  for k from 1 to n do 
    temp[k]:=temp[k]+tau*~NonlocalDivergence(W,P[k]):
    temp[k]:=temp[k]*~(tempF[k]^~(-1/2)):
  od:
  U:=ListTools[Transpose](temp):
  for i from 1 to m1 do
    tempF:=F[i]^~(1/2):
    U[i]:=PDInner(U[i],tempF):   
  od:
  Ubar:=U+theta*~(U-Uold):
  innercount:=innercount+1:
od:
#Uhard:=Threshhold(U,ThreshParam[1],ThreshParam[2],ThreshParam[3]):
Uhard:=DumbThreshhold(U):
SDATA:=SortData(DATA,Uhard):
COld:=C:
C:=NewCentroids(SDATA):
while nops(C)<n do
  C:=[op(C),RandomInitialCentroids(DATA,1)]:
od:
d:=max(seq(sqrt(evalf(add((C[i][k]-COld[i][k])^2,k=1..m2))),i=1..n)):
print(d):
od:

if full=1 then
SDATA:=SortData(DATA,Uhard):
else
#  [seq(ListTools[FindMaximalElement](Uhard[i],position)[2],i=1..m1)]:
Uhard:
fi:

end:


###########################################
###########################################
###                                     ###
###   AUXILIARY FUNCTIONS FOR  NLTVPD   ###
###                                     ###
###########################################
###########################################


#ConstructWeightMatrix(DATA,e,c):
#Inputs: 
#DATA: a matrix represented as a list of lists, with each row consisting of the vectors for each datapoint to be compared. 
#e: a positive number determining the weight given to the standard Euclidean distance as a measure of the distance between the vectors. Set e=1 and c=0 to construct a weight matrix solely using Euclidean distance.
#c: a positive number determining the weight given to the cosine similarity as a measure of the distance between the vectors. Set c=1 and e=0 to construct a weight matrix solely using cosine similarity.
#Outputs:
#W: the weight matrix given as a list of lists for all rows i and j in DATA, where W_{ij}=(e*d(i,j)+c*abs(arccos(dot(i,j)/norm(i)norm(j))))^(-1), where `d` is the standard Euclidean distance, and `norm` is the standard Euclidean norm. When i=j, this is set to infinity.

ConstructWeightMatrix:=proc(DATA,e,c) local W,N,m,n,i,j,k,x,y,w:
m:=nops(DATA):
n:=nops(DATA[1]):
W:=[[0$m]$m]:
N:=Norms(DATA):
for i from 1 to m do
  x:=DATA[i]:
  j:=i:
  while j<=m do
    y:=DATA[j]:
    if i=j then
      w:=0:
    else
      w:=e*sqrt(evalf(add((x[k]-y[k])^2,k=1..n)))+c*(1-add(x[k]*y[k],k=1..n)/(N[i]*N[j])):
      if abs(w)<10^(-10) then
        w:=10^(-10):
      fi:
      w:=abs(w)^(-1):
    fi:
    W[i][j]:=w:
    W[j][i]:=w:
    j:=j+1:
  od:
od:
W:
end:

#Norms(DATA):
#Inputs:
#DATA: a matrix represented as a list of lists, with each row consisting of the vectors for each datapoint to be compared 
#Outputs:
#N: a list of numbers N[i]=norm(DATA[i]), where `norm` is the standard Euclidean norm.

Norms:=proc(DATA) local N,m,n,i,j,x:
m:=nops(DATA):
n:=nops(DATA[1]):
N:=[]:
for i from 1 to m do
  x:=DATA[i]:
  N:=[op(N),sqrt(evalf(add(x[j]^2,j=1..n)))]:
od:
N:
end:


#RandomInitialCentroids(DATA,n):
#Inputs: 
#DATA: a matrix represented as a list of lists, with each row consisting of the vectors for each datapoint to be compared. 
#n: a positive integer
#Outputs: 
#a matrix represented as a list of n lists, which each row consisting of a distinct random row vector from DATA.

RandomInitialCentroids:=proc(DATA,n) local ra,RA,r,i:
ra:=rand(1..nops(DATA)):
RA:=[]:
while nops(RA)<n do
  r:=ra():
  if not has(RA, r) then
    RA:=[op(RA),r]:
  fi:
od:
[seq(DATA[RA[i]],i=1..n)]:
end:


#Fidelity(g,C,e,c,l):
#Inputs:
#g: a data vector given as a list of length m2
#C: a list of n centroids, which are lists of m2
#e: a positive number determining the weight given to the standard Euclidean distance as a measure of the distance between the vectors. Set e=1 and c=0 to construct a weight matrix solely using Euclidean distance.
#c: a positive number determining the weight given to the cosine similarity as a measure of the distance between the vectors. Set c=1 and e=0 to construct a weight matrix solely using cosine similarity.
#l: a positive number determining the weight in the energy functional given to the fidelity term.
#Outputs:
#F: the fidelity function values between the datapoint and the centroids

Fidelity:=proc(g,C,e,c,l) local n,m,F,N,gn,i,k,f:
n:=nops(C):
m:=nops(g):
F:=[]:
N:=Norms(C):
gn:=sqrt(evalf(add(g[j]^2,j=1..m))):
for k from 1 to n do
  f:=e*sqrt(evalf(add((g[i]-C[k][i])^2,i=1..m)))+c*evalf(1-add(g[i]*C[k][i],i=1..m)/(N[k]*gn)):
  f:=l*f^2:
  F:=[op(F),f]:
od:
F:
end:


#NonlocalGradient(W,U):
#Inputs:
#W: an m1xm1 weight matrix given as a list of lists
#U: the assignment function on the m1 datapoints to n clusters
#Outputs:
#NLG: a list of length n with entries of m1xm1 matrices (given as a list of lists) of the non-local gradient of U with respect to W.

NonlocalGradient:=proc(W,U) local w,n,NLG,i,j,k, temp:
w:=nops(W):
n:=nops(U[1]):
NLG:=[[[0$w]$w]$n]:
for i from 1 to w do
  j:=i+1:
  while j<=w do
    temp:=W[i][j]*(U[j]-U[i]):
    for k from 1 to n do
      NLG[k][i][j]:=temp[k]:
      NLG[k][j][i]:=-temp[k]:
    od:
    j:=j+1:
  od:
od:
NLG:
end:


#NonlocalDivergence(W,v):
#Inputs:
#W: an m1xm1 weight matrix given as a list of lists
#v: an m1xm1 matrix given as a list of lists 
#Outputs:
#a list of m1 entries of the nonlocal divergence of v with respect to W

NonlocalDivergence:=proc(W,v) local Wmat, vmat, Div,Ones:
Wmat:=convert(W,`Matrix`):
vmat:=convert(v,`Matrix`):
Div:=Wmat*~vmat:
Div:=Div-LinearAlgebra[Transpose](Div):
Ones:=Matrix(nops(W), 1,1):
Div:=LinearAlgebra[Multiply](Div,Ones):
convert(Div,`list`):
end:


#PDInner(u,f)
#Inputs:
#u: a list with n entries representing the assignment function u on DATA to a certain cluster
#f: a list with n entries representing the fidelity function f on DATA to a certain cluster
Outputs:
#the new minimized u
#Adapted from Wei Zhu (Duke math department) 's matlab algorithm written in 2015  

PDInner:=proc(u,f) local n,a,sa,cond,tmpsum,tmpsumdom,tmax,i,ind:
n:=nops(u):
a:=sort(u*~f,`>`,output=[sorted, permutation]):
sa:=PermuteList(f,a[2]):
cond:=a[1]*~(sa^~(-2)):
tmpsum:=0:
tmpsumdom:=0:
ind:=true:
for i from 1 to n-1 do
  tmpsum:=tmpsum+cond[i]:
  tmpsumdom:=tmpsumdom+sa[i]^(-2):
  tmax:=(tmpsum-1)/tmpsumdom:
  if tmax>=a[1][i+1] then
    ind:=false:
    break:
  fi:
od:
if ind then
  tmax:=(tmpsum+cond[n]-1)/(tmpsumdom+sa[n]^(-2)):
fi:
[seq(max(u[i]-tmax/f[i],0)/f[i],i=1..n)]:
end: 


#PermuteList(L,P)
#Inputs:
#L: a list
#P: a list of integers from 1 to nops(L) ordered
#Outputs:
#A new list Lnew consisting of the elements of L ordered under the permutation P

PermuteList:=proc(L,P) local i,n,Lnew:
Lnew:=[]:
n:=nops(L):
for i from 1 to n do
  Lnew:=[op(Lnew),L[P[i]]]:
od:
Lnew:
end:


#AverageVector(M):
#Inputs:
#M: a matrix represented as list of lists
#Outputs:
#v: the mean vector of the set of row vectors of M (calculated component-wise), as a list.

AverageVector:=proc(M) local v,m,n,i,j:
m:=nops(M):
n:=nops(M[1]):
v:=[]:
for i from 1 to n do
  v:=[op(v),add(M[j][i],j=1..m)/m]:
od:
v:
end:


#NewCentroids(SDATA):
#Inputs:
#SDATA: a list of lists of lists, where each sublist is the list of vectors (represented as lists) assigned together to a cluster.
#Ouputs:
#C: a matrix represented as a list of n lists, with each row consisting of the vector that is the average of the vectors in SDATA[i] for 1<=i<=n.

NewCentroids:=proc(SDATA) local C,n,i:
n:=nops(SDATA):
C:=[]:
for i from 1 to n do
  C:=[op(C),AverageVector(SDATA[i])]:
od:
C:
end:


#DumbThreshhold(U):
#Inptus:
#U: a matrix given by a list of lists representing the assignment function of the m1 datapoints to n clusters
#Outpus:
#Uhard: a matrix given by a list of lists which sets the one element of each row of U to 1 indicating the cluster assigned by smart simplex clustering, and every other element to 0.

DumbThreshhold:=proc(U) local m1,n,i,Uhard,k:
m1:=nops(U):
n:=nops(U[1]):
Uhard:=[[0$n]$m1]:
for i from 1 to m1 do
  k:=ListTools[FindMaximalElement](U[i],position)[2]:
  Uhard[i][k]:=1:
od:
Uhard:
end:


#Threshhold(U,s,eta,y): performs smart simplex clustering. 
#Inputs:
#U: a matrix given by a list of lists representing the assignment function of the m1 datapoints to n clusters
#s: a positive integer. this parameter indicates grid size that will be created on the simplex to check. larger s will result in more accurate results but longer runtimes.
#eta: a positive number. this parameter indicates the relative weight of stability vs sparsity.
#y: a positive number. this parameter controls the size of the `y-shaped region`, ie how far the datapoints should stay away from the edges of the clusters.
#Ouputs:
#Uhard: a matrix given by a list of lists which sets the one element of each row of U to 1 indicating the cluster assigned by smart simplex clustering, and every other element to 0.

Threshhold:=proc(U,s,eta,y) local m1,n,delta,gs,d,fdel,gdel,A,i,mid,temp,cluster,j,c,champ,rec,hope,champdelta,hard,Uhard,c1:
m1:=nops(U):
n:=nops(U[1]):
gs:=Matrix(n*m1,n,1/(s*y)):
delta:=SimplexGrid(n,s):
d:=nops(delta):
fdel:=Matrix(d,n,0):
gdel:=Matrix(d,1,0):
A:=ProjectEdge(U):
for i from 1 to d do
  mid:=ProjectEdge(delta[i]):
  mid:=convert(linalg[blockmatrix](m1,1,[convert(mid,`matrix`)$m1]),`Matrix`):
  cluster:=Matrix(n,1,0):
  temp:=A-mid:
  temp:=GreaterThanEq(temp,0):
  temp:=MTM[sum](temp,2):
  temp:=GreaterThanEq(temp,n):
  temp:=ArrayTools[Reshape](temp, [n,m1]):
  temp:=NoDoubleCounting(temp):
  cluster:=FindFlatIndex(temp):
  cluster:=LinearAlgebra[Modular][Mod](n,cluster,integer):
  c1:=0:
  for j from 1 to n-1 do
    c:=CountInstances(cluster,j):
    c1:=c1+c:
    fdel[i,j]:=c/m1:
  od:
  c:=CountInstances(cluster,0):
  c1:=c1+c:
  if c1=m1 then
    fdel[i,n]:=c/m1:
  else
    fdel[i]:=convert([1$(n-1),-1],`Vector`):
  fi:
  temp:=A-mid-gs:
  temp:=GreaterThanEq(temp,0):
  temp:=MTM[sum](temp,2):
  temp:=GreaterThanEq(temp,n-1):
  gdel[i,1]:=1-MTM[sum](temp)/m1:
od:
fdel:=MTM[prod](fdel,2):
fdel:=SetAToB(fdel,0,10^(-10)):
fdel:=convert(fdel,`list`):
gdel:=convert(gdel,`list`):
if fdel[1]<0 then
  hope:=infinity:
else
  hope:=evalf(-log(fdel[1])+eta*exp(gdel[1])):
fi:
champ:=1:
for i from 2 to d do
  if fdel[i]<0 then
    rec:=infinity:
  else
    rec:=evalf(-log(fdel[i])+eta*exp(gdel[i])):
  fi:
  if rec<hope then
    hope:=rec:
    champ:=i:
  fi:
od:
champdelta:=delta[champ]:
mid:=ProjectEdge(champdelta):
mid:=convert(linalg[blockmatrix](m1,1,[convert(mid,`matrix`)$m1]),`Matrix`):
temp:=A-mid:
temp:=GreaterThanEq(temp,0):
temp:=MTM[sum](temp,2):
temp:=GreaterThanEq(temp,n):
temp:=ArrayTools[Reshape](temp, [n,m1]):
temp:=NoDoubleCounting(temp):
hard:=FindFlatIndex(temp):
hard:=LinearAlgebra[Modular][Mod](n,hard,integer):
hard:=SetAToB(hard,0,n):
Uhard:=[]:
for i from 1 to m1 do
  Uhard:=[op(Uhard), ConvertHard(hard[1,i],n)]:
od:
Uhard:
end:

#SimplexGrid(n,s)
#Inputs:
#n: the dimension
#s: a positive integer. this parameter indicates grid size that will be created on the simplex to check. larger s will result in more accurate results but longer runtimes.
#Outputs:
#a list of evenly spaced gridpoints on the simplex

SimplexGrid:=proc(n,s) local L, l, bad:
L:=Comp(s,n):
bad:={seq([0$i, s, 0$(n-i-1)],i=0..n-1)}:
L:=L minus bad:
convert(L,`list`)/s:
end:

#FROM C4.txt: http://sites.math.rutgers.edu/~zeilberg/EM19/C4.txt
#Comp(n,k): inputs non-neg. integers n and k and outputs the SET of
# compositions of n into k parts (vectors of non-neg. integers that add-up to n)
#of length k
Comp:=proc(n,k) local S,a1,S1, s1:
option remember:
if k<0 or n<0 then
 RETURN({}):
fi:
if n=0 then
 RETURN({[0$k]}):
fi:
if k=0 then
 if n=0 then
  RETURN({[]}):
 else
  RETURN({}):
 fi:
fi:
S:={}:
for a1 from 0 to n do
 S1:=Comp(n-a1,k-1):
 S:=S union {seq([a1,op(s1)], s1 in S1)}:
od:
S:
end:
#END FROM C4.txt

#ProjectEdge(M): returns the projection of every row of x on to every edge (2-simplex)
#Inputs:
#M: a m1 by n dataset
#Outputs:
#A: an (m1*n) x m1 matrix given as a list of lists. The diagonal of every submatrix of A is 0.5, and for i=/=j, (submatrix of A)[i,j] is the ith coordinate of the projection of x onto the ij edge of the simplex.

ProjectEdge:=proc(M) local A,m1,n,i,j,k,temp,MT:
MT:=convert(M,`Matrix`):
MT:=LinearAlgebra[Transpose](MT):
n,m1:=LinearAlgebra[Dimension](MT):
A:=Matrix(n*m1,n,0):
for i from 1 to n do
  for j from 1 to n do
    temp:=0.5*(MT[i]-MT[j]+~1):
    for k from 0 to m1-1 do
       A[i+n*k,j]:=temp[k+1]:
    od:
  od:
od:
A:
end:

#GreaterThanEq(temp,a):
#Inputs:
#temp: a Matrix
#a: a number 
#Outputs:
#ind: a Matrix where all entries>=a of temp are set to 1, and negative entries to 0.

GreaterThanEq:=proc(temp,a) local Ind,t1,t2,i,j:
if type(temp,`Vector`) then
  t1:=LinearAlgebra[Dimension](temp):
  Ind:=Vector(t1,0):
  for i from 1 to t1 do
    if temp[i]>=a then
      Ind[i]:=1:
    fi:
  od:
return(Ind):
else
  t1,t2:=LinearAlgebra[Dimension](temp):
fi:
Ind:=Matrix(t1,t2,0):
for i from 1 to t1 do
  for j from 1 to t2 do
    if temp[i,j]>=a then
      Ind[i,j]:=1:
    fi:
  od:
od:
Ind:
end:

#NoDoubleCounting(temp):
#Inputs:
#temp: a matrix
#Outputs:
#a matrix identical to temp, but at the first non-zero value of each column, all values below are set to 0

NoDoubleCounting:=proc(temp) local T,Tnew, t1,t2,i,j:
T:=LinearAlgebra[Transpose](temp):
t1,t2:=LinearAlgebra[Dimension](T):
Tnew:=Matrix(t1,t2,0):
for i from 1 to t1 do
  for j from 1 to t2 do
    if T[i,j]<>0 then
      Tnew[i]:=convert([0$(j-1),T[i,j],0$(t2-j)],`Vector`):
      break:
    fi:
    if j=t2 then
      Tnew[i]:=convert([0$t2],`Vector`):
    fi:
  od:
od:
LinearAlgebra[Transpose](Tnew):
end:

#SetAToB(M,a,b):
#Inputs: 
#M: a matrix
#a, b: numeric values 
#Outputs:
#A matrix identical to M except all instances of a have been replaced with b

SetAToB:=proc(M,a,b) local Mnew,m1,m2,i,j:
if type(M,`Vector`) then
  m1:=LinearAlgebra[Dimension](M):
  Mnew:=M:
  for i from 1 to m1 do
    if M[i]=a then
      Mnew[i]:=b:
    fi:
  od:
return(Mnew):
else
  m1,m2:=LinearAlgebra[Dimension](M):
fi:
Mnew:=M:
for i from 1 to m1 do
  for j from 1 to m2 do
    if M[i,j]=a then
      Mnew[i,j]:=b:
    fi:
  od:
od:
Mnew:
end:
 
#FindFlatIndex(temp)
#Inputs:
#temp: a matrix
#Outputs:
#the flat index (ie temp_i,j given by i*j) of all nonzero elements

FindFlatIndex:=proc(temp) local Ind,t1,i,t2,j:
t1,t2:=LinearAlgebra[Dimension](temp):
Ind:=[]:
for i from 1 to t1 do
  for j from 1 to t2 do
  if temp[i,j]<>0 then
    Ind:=[op(Ind), i*j]:
  fi:
od:
od:
convert(Ind, `Matrix`):
end:

#CountInstances(temp,k)
#Inputs:
#temp: a matrix
#k: a numeric value
#Outputs:
#c: the number of times `k` appears in temp

CountInstances:=proc(temp,k) local T,c,t1,i:
T:=convert(temp,`list`):
t1:=nops(T):
c:=0:
for i from 1 to t1 do
  if T[i]=k then
    c:=c+1:
  fi:
od:
c:
end:

#ConvertHard(i,n)
#Inputs:
#i,n: integers such that n>=i>0.
#Outputs:
#a vector given as a list with n entries and the ith entry 1, all others 0.

ConvertHard:=proc(i,n):
[0$(i-1),1,0$(n-i)]:
end:


#SortData(DATA,Uhard):
#Inputs:
#DATA: an m1xm2 matrix represented as a list of lists, with each row consisting of the vectors for each datapoint to be compared. 
#Uhard: an m1xn matrix given by a list of lists, where n is the number of clusters for the data to be sorted into. Entries of Uhard consist all of 0's and 1's, with there being a single 1 in each row such that Uhard_ij=1 implies that the vector DATA[i] has been sorted into cluster j.
#Note that as we are working in list form, we can input U instead of Uhard and get the same results. The construction of Uhard is useful in code optimization because it allows for useage of sparse matrices. 
#Ouputs:
#SDATA: a list of lists of lists, where each sublist is the list of vectors (represented as lists) assigned together to a cluster as indicated by Uhard.

SortData:=proc(DATA,Uhard) local m1,n,i,c,SDATA:
m1:=nops(DATA):
n:=nops(Uhard[1]):
SDATA:=[[]$n]:
for i from 1 to m1 do
  c:=ListTools[FindMaximalElement](Uhard[i],position)[2]:
  SDATA[c]:=[op(SDATA[c]),DATA[i]]:
od:
SDATA:
end:


#NormalizeData(DATA)
#Inputs: 
#DATA: a matrix represented as a list of lists, with each row consisting of the vectors for each datapoint to be compared. 
#Outputs:
#a new matrix represented as a list of lists of a `normalized` version of DATA, where each column has been divided by the maximum value of the column. 

NormalizeData:=proc(DATA) local M,m,n,i:
m:=nops(DATA):
n:=nops(DATA[1]):
M:=LinearAlgebra[Transpose](convert(DATA,`Matrix`)):
for i from 1 to n do
  M[i]:=LinearAlgebra[Normalize](M[i]):
od:
M:=LinearAlgebra[Transpose](M):
[seq(convert(M[i],`list`),i=1..m)]:
end:


#################################################################################
#################################################################################