I've got a large matrix stored as a scipy.sparse.csc_matrix and want to subtract a column vector from each one of the columns in the large matrix. This is a pretty common task when you're doing things like normalization/standardization, but I can't seem to find the proper way to do this efficiently.

Here's an example to demonstrate:

```
# mat is a 3x3 matrix
mat = scipy.sparse.csc_matrix([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
#vec is a 3x1 matrix (or a column vector)
vec = scipy.sparse.csc_matrix([1,2,3]).T
"""
I want to subtract `vec` from each of the columns in `mat` yielding...
[[0, 1, 2],
[0, 1, 2],
[0, 1, 2]]
"""
```

One way to accomplish what I want is to hstack `vec`

to itself 3 times, yielding a 3x3 matrix where each column is `vec`

and then subtract that from `mat`

. But again, I'm looking for a way to do this efficiently, and the hstacked matrix takes a long time to create. I'm sure there's some magical way to do this with slicing and broadcasting, but it eludes me.

Thanks!

EDIT: Removed the 'in-place' constraint, because sparsity structure would be constantly changing in an in-place assignment scenario.

You can introduce fake dimensions by altering the `strides`

of your vector. You can, with out additional allocation, "convert" your vector to a 3 x 3 matrix using `np.lib.stride_tricks.as_strided`

. This page has an example and a bit of a discussion about it along with some discussion of related topics (like views). Search the page for "Example: fake dimensions with strides."

There are also quite a few example on SO about this... but my searching skills are failing me now.

For a start what would we do with dense arrays?

```
mat-vec.A # taking advantage of broadcasting
mat-vec.A[:,[0]*3] # explicit broadcasting
mat-vec[:,[0,0,0]] # that also works with csr matrix
```

In http://codereview.stackexchange.com/questions/32664/numpy-scipy-optimization/33566 we found that using `as_strided`

on the `mat.indptr`

vector is the most efficient way of stepping through the rows of a sparse matrix. (The `x.rows`

, `x.cols`

of an `lil_matrix`

are nearly as good. `getrow`

is slow). This function implements such as iteration.

```
def sum(X,v):
rows, cols = X.shape
row_start_stop = as_strided(X.indptr, shape=(rows, 2),
strides=2*X.indptr.strides)
for row, (start, stop) in enumerate(row_start_stop):
data = X.data[start:stop]
data -= v[row]
sum(mat, vec.A)
print mat.A
```

I'm using `vec.A`

for simplicity. If we keep `vec`

sparse we'd have to add a test for nonzero value at `row`

. Also this type of iteration only modifies the nonzero elements of `mat`

. `0's`

are unchanged.

I suspect the time advantages will depend a lot on the sparsity of matrix and vector. If `vec`

has lots of zeros, then it makes sense to iterate, modifying only those rows of `mat`

where `vec`

is nonzero. But `vec`

is nearly dense like this example, it may be hard to beet `mat-vec.A`

.

So in short, if you use CSR instead of CSC, it's a one-liner:

```
mat.data -= numpy.repeat(vec.toarray()[0], numpy.diff(mat.indptr))
```

If you realized it, this is better done in row-wise fashion, since we will deduct the same number from each row. In your example then: deduct 1 from the first row, 2 from the second row, 3 from the third row.

I actually encountered this in a real life application where I want to classify documents, each represented as a row in the matrix, while the columns represent words. Each document has a score which should be multiplied to the score of each word in that document. Using row representation of the sparse matrix, I did something similar to this (I modified my code to answer your question):

```
mat = scipy.sparse.csc_matrix([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
#vec is a 3x1 matrix (or a column vector)
vec = scipy.sparse.csc_matrix([1,2,3]).T
# Use the row version
mat_row = mat.tocsr()
vec_row = vec.T
# mat_row.data contains the values in a 1d array, one-by-one from top left to bottom right in row-wise traversal.
# mat_row.indptr (an n+1 element array) contains the pointer to each first row in the data, and also to the end of the mat_row.data array
# By taking the difference, we basically repeat each element in the row vector to match the number of non-zero elements in each row
mat_row.data -= numpy.repeat(vec_row.toarray()[0],numpy.diff(mat_row.indptr))
print mat_row.todense()
```

Which results in:

[[0 1 2] [0 1 2] [0 1 2]]

The visualization is something like this:

```
>>> mat_row.data
[1 2 3 2 3 4 3 4 5]
>>> mat_row.indptr
[0 3 6 9]
>>> numpy.diff(mat_row.indptr)
[3 3 3]
>>> numpy.repeat(vec_row.toarray()[0],numpy.diff(mat_row.indptr))
[1 1 1 2 2 2 3 3 3]
>>> mat_row.data -= numpy.repeat(vec_row.toarray()[0],numpy.diff(mat_row.indptr))
[0 1 2 0 1 2 0 1 2]
>>> mat_row.todense()
[[0 1 2]
[0 1 2]
[0 1 2]]
```

Similar Questions

I just started learning SciPy and am struggling with the most basic features. Consider the following standard vector: In [6]: W=array([[1],[2]]) In [7]: print W [[1] [2]] If I understand it correctly

I want to efficiently delete a lot of data from the beginning of a matrix of dimension 2*n. The matrix looks like this: x1 x2 x3 x4 ... ... I want to delete all rows that have the the first element o

I'm trying to do what I think is a simple operation buy just can't figure out the command-line options for convert. I have two images and would like to subtract the one from the other. subtract resu

So I have a 2D logical matrix of 0's and 1's, and I want a vector that is 2xn where n is the number of 1's in the matrix. In the 1st line of the vector I want the index across the lines of the matrix,

OK I have recently discovered that the the scipy.spatial.distance.cdist command is very quick for solving a COMPLETE distance matrix between two vector arrays for source and destination. see: calculat

I have a scipy.sparse.csr.csr_matrix that represents words in a document and a list of lists where each index represents the categories for each index in the matrix. The problem that I am having is t

I have the following function transformShape(Shape& shape, Eigen::Matrix4f transformation); which obviously transforms geometrical shapes, and I have a translation in the form of a Eigen::Vector

This is my first post and I'm still a Python and Scipy newcomer, so go easy on me! I'm trying to convert an Nx1 matrix into a python list. Say I have some 3x1 matrix x = scipy.matrix([1,2,3]).transpos

I have the matrix y with variable x: x [1,] 0 [2,] 1 [3,] 0 [4,] 0 [5,] 1 [6,] 1 I selected just values with 1. Now I have a vector z: 2 5 6 I need match this vector with lines selected with my ma

The eigenvalues of a covariance matrix should be real and non-negative because covariance matrices are symmetric and semi positive definite. However, take a look at the following experiment with scipy

This is an Rcpp conversion related Q. I'm looking to convert a long std::vector into a Rcpp matrix object, but want to know if there is an easy conversion format. Naturally, you could loop over each e

Several SciPy functions are documented as taking a condensed distance matrix as returned by scipy.spatial.distance.pdist. Now, inspection shows that what pdist returns is the row-major 1D-array form

Started learning octave recently. How do I generate a matrix from another matrix by applying a function to each element? eg: Apply 2x+1 or 2x/(x^2+1) or 1/x+3 to a 3x5 matrix A. The result should be

I have a number of scipy sparse matrices (currently in CSR format) that I need to multiply with a dense numpy 1D vector. The vector is called G: print G.shape, G.dtype (2097152,) complex64 Each spars

I can't get simple matrix operations to work on data, for the life of me I haven't been able to figure out what I'm doing incorrectly: data = np.genfromtxt(dataset1, names=True, delimiter=,, dtype=

I have a matrix X1 with 6 columns. Column 3 in this X1 matrix contains RouteNo. I also have a vector V1 which is extracted from another matrix. Few values from this vector matches with RouteNo in X1.

Suppose I have a NxN matrix M (lil_matrix or csr_matrix) from scipy.sparse, and I want to make it (N+1)xN where M_modified[i,j] = M[i,j] for 0 <= i < N (and all j) and M[N,j] = 0 for all j. Basi

Say I have a huge numpy matrix A taking up tens of gigabytes. It takes a non-negligible amount of time to allocate this memory. Let's say I also have a collection of scipy sparse matrices with the sam

Both Matrix and Vector constructor has kind *->*, so they look like value constructors. But when I try something like instance Functor Vector a where fmap g ( Vector a ) = Vector ( g a ) I get th

Say I have a vector that looks like this: 1/2 a + 1/3 b b + c 2a + c 1/3c + 4d Mathematically this can be factorised into matrix and a vector: Matrix: 1/2 1/3 0 0 0 1 1 0 2 0 1 0 0 0 1/3 4 Vector: a

Sorry for rookie question. I'm learning to work with scipy.sparse and I'm out of ideas why this code does not work. The dimensions are correct but the subtraction can not be computed: c=(count_mat[i])

Is there a another way to subtract the smallest value from all the values of a column, effectively offset the values? I have to subtract the first number in teh 1st column from all other numbers in te

What I want is a user input a selected date, and subtract that date from the current date, and then create a sleep timer according to the results. from datetime import tzinfo, timedelta, datetime def

Is there a way, in R, to save a whole vector into one value of a matrix or data frame, without having to combine it into a single value first? For example, if I had a vector.. pk<-c(0.021477,0.0211

I thought I understood matrix math well enough, but apparently I'm clueless Here's the setup: I have an object at [0,0,0] in world space. I have a camera class controlled by mouse movements to rotate

For the purpose I used the solution from that thread link by now, however it gives memory error as expected since my matrix A size is 6 million to 40000 matrix. Therefore I am looking for any other so

Wondering how to duplicate a vector into a matrix in R. For example v = 1:10 dup = duplicate(V,2) where dup looks like rbind(1:10,1:10). Thanks

Hello I have a system which is subscription based so I want to make it subtract 1 from the column daysleft every day, so if they have 30 days inserted into that column it would subtract one every

Is there a way to subtract a variable and use the remainder of that variable for the next column in MS-SQL Basically what i want to do is subtract a certain value and start playing of the values from

I have a problem where depending on the result of a random coin flip, I have to sample a random starting position from a string. If the sampling of this random position is uniform over the string, I t

In MATLAB, the line below converts a Matrix to a Vector.It flattens the matrix column by column into a vector. myvar(:) How do I do that with Eigen? The solution should work for any dimension of Matr

suppose that we have following 1D array x(1),x(2),......x(n) where n is length of sample,and suppose that we want create matrix consisting from this elements using following rule,using some parameter

I have to compute massive similarity computations between vectors in a sparse matrix. What is currently the best tool, scipy-sparse or pandas, for this task?

I have a vector v, and I would like to create the following matrix. How can I do this in R? v = c(1, 2, 3, 4) > m = matrix(c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4,4), nrow=4) > m [,1] [,2

I have a list of length 130,000 where each element is a character vector of length 110. I would like to convert this list to a matrix with dimension 1,430,000*10. How can I do it more efficiently? My

I'm trying to cluster some data with python and scipy but the following code does not work for reason I do not understand: from scipy.sparse import * matrix = dok_matrix((en,en), int) for pub in pubs:

I am in the process of designing a maths library. I want to start with a simple vector 4 and a matrix 4x4 and I'll extend it with the needs. I am trying to pro and cons of several design I have seen s

I researched a lot on this but couldn't find a practical solution to this problem. I am using scipy to create csr sparse matrix and want to substract this matrix from an equivalent matrix of all ones.

I have a vector of values which i want to assign to a NxN matrix. The vector values correspond to the lower half of the matrix,but the matrix is symmetric. The problem is that the vector values are no

This question already has an answer here: Why OpenMP under ubuntu 12.04 is slower than serial version 3 answers Parallelizing matrix times a vector by columns and by rows with OpenMP 1 answer

I have a vector of length 3. i want to represent it as a matrix of dimension 4*2. ie) if the length of vector is n then matrix should be of dimension (n+1)*2. The matrix should have elements arranged

I want to reshape a 2d scipy.sparse.csr.csr_matrix(let us call it A) to a 2d numpy.ndarray (let us call this B). A could be >shape(A) (90, 10) then B should be >shape(B) (9,10) where each 10 r

So take for example these to matrices. mat1<- matrix(c(1:30),ncol=3) [,1] [,2] [,3] [1,] 1 11 21 [2,] 2 12 22 [3,] 3 13 23 [4,] 4 14 24 [5,] 5 15 25 [6,] 6 16 26 [7,] 7 17 27 [8,] 8 18 28 [9,] 9 1

What is the inverse operation to vec in Octave? E.g. if I need to convert 12x1 vector into 3x4 matrix, what should I do?

I have a vector of integers and I want to construct a diagonal matrix with vectos's element as diagonal entries of the matrix. For example: if vector is 1 2 3 the diagonal matrix would be: 1 0 0 0 2 0

I have 2:00 in a table and I would like to subtract 0:01 from it. I have tried in a recent post before this one, but it is no help. In PHP and MySQL, how would I subtract 1 second from 2 minutes? if (

I have a fairly big vector (>500,000 in length). It contains a bunch of NA interspersed with 1 and it is always guaranteed that it begins with 1. I would like to replace some of the NA in v1 with 1

I have a triangular matrix and would like to loop through all the elements efficiently. Is there a smart way that I simply don't see? So here's a small example of my matrix [,1] [,2] [,3] [,4] [1,] 1

I'm having trouble resizing a matrix - the set_shape function seems to have no effect: >>> M <14x3562 sparse matrix of type '<type 'numpy.float32'>' with 6136 stored elements in LInk

I would like to create a vector from a matrix by applying a conditional statement to each column. The conditional statement being, if any value in the column exceeds a fixed threshold, then the value