Wednesday, August 30, 2017

Numpy, how to divide the row by its sum efficiently

Method #1: use None (or np.newaxis) to add an extra dimension so that broadcasting will behave:
>>> e
array([[ 0.,  1.],
       [ 2.,  4.],
       [ 1.,  5.]])
>>> e/e.sum(axis=1)[:,None]
array([[ 0.        ,  1.        ],
       [ 0.33333333,  0.66666667],
       [ 0.16666667,  0.83333333]])
Method #2: go transpose-happy:
>>> (e.T/e.sum(axis=1)).T
array([[ 0.        ,  1.        ],
       [ 0.33333333,  0.66666667],
       [ 0.16666667,  0.83333333]])
(You can drop the axis= part for conciseness, if you want.)
Method #3: (promoted from Jaime's comment). (only works Numpy 1.7+)
Use the keepdims argument on sum to preserve the dimension:
>>> e/e.sum(axis=1, keepdims=True)
array([[ 0.        ,  1.        ],
       [ 0.33333333,  0.66666667],
       [ 0.16666667,  0.83333333]])

No comments: