Given two sorted arrays of numbers, we want to find the pair with the kth largest possible sum. (A pair is one element from the first array and one element from the second array). For example, with arrays

- [2, 3, 5, 8, 13]
- [4, 8, 12, 16]

The pairs with largest sums are

- 13 + 16 = 29
- 13 + 12 = 25
- 8 + 16 = 24
- 13 + 8 = 21
- 8 + 12 = 20

So the pair with the 4th largest sum is (13, 8). How to find the pair with the kth largest possible sum?

Also, what is the fastest algorithm? The arrays are already sorted and sizes M and N.

I am already aware of the **O(Klogk)** solution , using Max-Heap given here .

It also is one of the favorite *Google* interview question , and they demand a **O(k) solution** .

I've also read somewhere that there exists a **O(k)** solution, which i am unable to figure out .

Can someone explain the correct solution with a pseudocode .

P.S. Please DON'T post this link as answer/comment.It DOESN'T contain the answer.

I start with a simple but not quite linear-time algorithm. We choose some value between `array1[0]+array2[0]`

and `array1[N-1]+array2[N-1]`

. Then we determine how many pair sums are greater than this value and how many of them are less. This may be done by iterating the arrays with two pointers: pointer to the first array incremented when sum is too large and pointer to the second array decremented when sum is too small. Repeating this procedure for different values and using binary search (or one-sided binary search) we could find Kth largest sum in O(N log R) time, where N is size of the largest array and R is number of possible values between `array1[N-1]+array2[N-1]`

and `array1[0]+array2[0]`

. This algorithm has linear time complexity only when the array elements are integers bounded by small constant.

Previous algorithm may be improved if we stop binary search as soon as number of pair sums in binary search range decreases from O(N^{2}) to O(N). Then we fill auxiliary array with these pair sums (this may be done with slightly modified two-pointers algorithm). And then we use quickselect algorithm to find Kth largest sum in this auxiliary array. All this does not improve worst-case complexity because we still need O(log R) binary search steps. What if we keep the quickselect part of this algorithm but (to get proper value range) we use something better than binary search?

We could estimate value range with the following trick: get every second element from each array and try to find the pair sum with rank `k/4`

for these half-arrays (using the same algorithm recursively). Obviously this should give some approximation for needed value range. And in fact slightly improved variant of this trick gives range containing only O(N) elements. This is proven in following paper: "Selection in X + Y and matrices with sorted rows and columns" by A. Mirzaian and E. Arjomandi. This paper contains detailed explanation of the algorithm, proof, complexity analysis, and pseudo-code for all parts of the algorithm except Quickselect. If linear worst-case complexity is required, Quickselect may be augmented with Median of medians algorithm.

This algorithm has complexity O(N). If one of the arrays is shorter than other array (M < N) we could assume that this shorter array is extended to size N with some very small elements so that all calculations in the algorithm use size of the largest array. We don't actually need to extract pairs with these "added" elements and feed them to quickselect, which makes algorithm a little bit faster but does not improve asymptotic complexity.

If k < N we could ignore all the array elements with index greater than k. In this case complexity is equal to O(k). If N < k < N(N-1) we just have better complexity than requested in OP. If k > N(N-1), we'd better solve the opposite problem: k'th smallest sum.

I uploaded simple C++11 implementation to ideone. Code is not optimized and not thoroughly tested. I tried to make it as close as possible to pseudo-code in linked paper. This implementation uses `std::nth_element`

, which allows linear complexity only on average (not worst-case).

A completely different approach to find K'th sum in linear time is based on priority queue (PQ). One variation is to insert largest pair to PQ, then repeatedly remove top of PQ and instead insert up to two pairs (one with decremented index in one array, other with decremented index in other array). And take some measures to prevent inserting duplicate pairs. Other variation is to insert all possible pairs containing largest element of first array, then repeatedly remove top of PQ and instead insert pair with decremented index in first array and same index in second array. In this case there is no need to bother about duplicates.

OP mentions O(K log K) solution where PQ is implemented as max-heap. But in some cases (when array elements are evenly distributed integers with limited range and linear complexity is needed only on average, not worst-case) we could use O(1) time priority queue, for example, as described in this paper: "A Complexity O(1) Priority Queue for Event Driven Molecular Dynamics Simulations" by Gerald Paul. This allows O(K) expected time complexity.

Advantage of this approach is a possibility to provide first K elements in sorted order. Disadvantages are limited choice of array element type, more complex and slower algorithm, worse asymptotic complexity: O(K) > O(N).

**EDIT: This does not work.** I leave the answer, since apparently I am not the only one who could have this kind of idea; see the discussion below. A counter-example is x = (2, 3, 6), y = (1, 4, 5) and k=3, where the algorithm gives 7 (3+4) instead of 8 (3+5).

Let `x`

and `y`

be the two arrays, sorted in decreasing order; we want to construct the `K`

-th largest sum.

The variables are: `i`

the index in the first array (element `x[i]`

), `j`

the index in the second array (element `y[j]`

), and `k`

the "order" of the sum (`k`

in `1..K`

), in the sense that `S(k)=x[i]+y[j]`

will be the `k`

-th greater sum satisfying your conditions (this is the loop invariant).

Start from `(i, j)`

equal to `(0, 0)`

: clearly, `S(1) = x[0]+y[0]`

.

for `k`

from `1`

to `K-1`

, do:

- if
`x[i+1]+ y[j] > x[i] + y[j+1]`

, then`i := i+1`

(and`j`

does not change) ; else`j:=j+1`

To see that it works, consider you have `S(k) = x[i] + y[j]`

. Then, `S(k+1)`

is the greatest sum which is lower (or equal) to `S(k)`

, and such as at least one element (`i`

or `j`

) changes. It is not difficult to see that exactly one of `i`

or `j`

should change. If `i`

changes, the greater sum you can construct which is lower than `S(k)`

is by setting `i=i+1`

, because `x`

is decreasing and all the `x[i'] + y[j]`

with `i' < i`

are greater than `S(k)`

. The same holds for `j`

, showing that `S(k+1)`

is either `x[i+1] + y[j]`

or `x[i] + y[j+1]`

.

Therefore, at the end of the loop you found the `K`

-th greater sum.

tl;dr: If you look ahead and look behind at each iteration, you can start with the end (which is highest) and work back in `O(K)`

time.

Although the insight underlying this approach is, I believe, sound, the code below is not quite correct at present (see comments).

Let's see: first of all, the arrays are sorted. So, if the arrays are `a`

and `b`

with lengths `M`

and `N`

, and as you have arranged them, the largest items are in slots `M`

and `N`

respectively, the largest pair will always be `a[M]+b[N]`

.

Now, what's the second largest pair? It's going to have perhaps one of `{a[M],b[N]}`

(it can't have both, because that's just the largest pair again), and at least one of `{a[M-1],b[N-1]}`

. BUT, we also know that if we choose `a[M-1]+b[N-1]`

, we can make one of the operands larger by choosing the higher number from the same list, so it will have exactly one number from the last column, and one from the penultimate column.

Consider the following two arrays: `a = [1, 2, 53]; b = [66, 67, 68]`

. Our highest pair is `53+68`

. If we lose the smaller of those two, our pair is `68+2`

; if we lose the larger, it's `53+67`

. So, we have to look ahead to decide what our next pair will be. The simplest lookahead strategy is simply to calculate the sum of both possible pairs. That will always cost two additions, and two comparisons for each transition (three because we need to deal with the case where the sums are equal);let's call that cost `Q`

).

At first, I was tempted to repeat that K-1 times. BUT there's a hitch: the next largest pair might actually be the other pair we can validly make from `{{a[M],b[N]}, {a[M-1],b[N-1]}`

. So, we also need to look behind.

So, let's code (python, should be 2/3 compatible):

```
def kth(a,b,k):
M = len(a)
N = len(b)
if k > M*N:
raise ValueError("There are only %s possible pairs; you asked for the %sth largest, which is impossible" % M*N,k)
(ia,ib) = M-1,N-1 #0 based arrays
# we need this for lookback
nottakenindices = (0,0) # could be any value
nottakensum = float('-inf')
for i in range(k-1):
optionone = a[ia]+b[ib-1]
optiontwo = a[ia-1]+b[ib]
biggest = max((optionone,optiontwo))
#first deal with look behind
if nottakensum > biggest:
if optionone == biggest:
newnottakenindices = (ia,ib-1)
else: newnottakenindices = (ia-1,ib)
ia,ib = nottakenindices
nottakensum = biggest
nottakenindices = newnottakenindices
#deal with case where indices hit 0
elif ia <= 0 and ib <= 0:
ia = ib = 0
elif ia <= 0:
ib-=1
ia = 0
nottakensum = float('-inf')
elif ib <= 0:
ia-=1
ib = 0
nottakensum = float('-inf')
#lookahead cases
elif optionone > optiontwo:
#then choose the first option as our next pair
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
elif optionone < optiontwo: # choose the second
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#next two cases apply if options are equal
elif a[ia] > b[ib]:# drop the smallest
nottakensum,nottakenindices = optiontwo,(ia-1,ib)
ib-=1
else: # might be equal or not - we can choose arbitrarily if equal
nottakensum,nottakenindices = optionone,(ia,ib-1)
ia-=1
#+2 - one for zero-based, one for skipping the 1st largest
data = (i+2,a[ia],b[ib],a[ia]+b[ib],ia,ib)
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
if ia <= 0 and ib <= 0:
raise ValueError("Both arrays exhausted before Kth (%sth) pair reached"%data[0])
return data, narrative
```

For those without python, here's an ideone: http://ideone.com/tfm2MA

At worst, we have 5 comparisons in each iteration, and K-1 iterations, which means that this is an O(K) algorithm.

Now, it might be possible to exploit information about differences between values to optimise this a little bit, but this accomplishes the goal.

Here's a reference implementation (not `O(K)`

, but will always work, unless there's a corner case with cases where pairs have equal sums):

```
import itertools
def refkth(a,b,k):
(rightia,righta),(rightib,rightb) = sorted(itertools.product(enumerate(a),enumerate(b)), key=lamba((ia,ea),(ib,eb):ea+eb)[k-1]
data = k,righta,rightb,righta+rightb,rightia,rightib
narrative = "%sth largest pair is %s+%s=%s, with indices (%s,%s)" % data
print (narrative) #this will work in both versions of python
return data, narrative
```

This calculates the cartesian product of the two arrays (i.e. all possible pairs), sorts them by sum, and takes the kth element. The `enumerate`

function decorates each item with its index.

If the last two solutions were at (a1, b1), (a2, b2), then it seems to me there are only four candidate solutions (a1-1, b1) (a1, b1-1) (a2-1, b2) (a2, b2-1). This intuition could be wrong. Surely there are at most four candidates for each coordinate, and the next highest is among the 16 pairs (a in {a1,a2,a1-1,a2-1}, b in {b1,b2,b1-1,b2-1}). That's O(k).

(No it's not, still not sure whether that's possible.)

The max-heap algorithm in the other question is simple, fast and correct. Don't knock it. It's really well explained too. http://stackoverflow.com/a/5212618/284795

Might be there isn't any O(k) algorithm. That's okay, O(k log k) is almost as fast.

Similar Questions

I cannot find out an efficient way to generate the opposite edges of a given edge. My idea is just to do the iterates: //construct the opposite half edges for(int j=0;j<edge_num;j++) for(int m=0;m&

I've been playing around a bit with the algorithms for getting the largest sum with no two adjacent elements in an array but I was thinking: If we have an array with n elements and we want to find the

I m doing this def power_two(n, base = -1): result = 2 ** base if result < n: base += 1 power_two(n, base) else: if result == n: print base else: print base - 1 what is the pythonic way to find la

The largest item in a heap must appear in position 1, and the second largest must be in position 2 or position 3. Give the list of positions in a heap of size 31 where the kth largest (i) can appear,

This question already has an answer here: How to find the indices of the top 10,000 elements in a symmetric matrix(12k X 12k) in R 2 answers I have a very large data file, and my goal is to fin

I want to extract the main object in images. So I applied grabcut algorithm for the image. From that I want to take the largest contour in the result image resulted from the grabcut algorithm.

I was recently rejected from a potential employer after submitting this code. They suggested I wasn't technically capable enough. I'm wondering if someone could shed light on to how to make this bette

I want to find the column with the largest column sum. I am thinking of something like: threeLargest = colnames(sort(colSums(data[,2:length(data)]), decreasing = TRUE)[1:3]) but colnames just gives N

Suppose In database their is column called INCOME_PER_DAY. I bring data of this column in the gridview . Now My question is that I want to find the total sum of the column INCOME_PER_DAY using C# .how

playersCollection.each(function(player) { // how do I determine the model with the largest attribute player.get('points') // this is the attribute //Not sure what to write to filter through all the mo

How to find the Nth largest node in a BST? Do I keep a count variable while doing In Order Traversal of a BST? Return the element when the count = N???

Given n numbers, how do I find the largest and second largest number using at most n+log(n) comparisons? Note that it's not O(n+log(n)), but really n+log(n) comparisons.

I'm having little trouble with reading pair. So I'm creating my pair private Pair<Integer, Integer> count(somethink) { int c1 = 2; int c2 = 4; return new Pair<Integer, Integer>(c1, c2); }

I have a question about finding the kth largest element using a min-heap. The algorithm is as follows: We take the first k elements and build a minheap Let Sk be the smallest element in S. Look at a

I was asked this question in an interview, but couldn't figure it out and would like to know the answer. Suppose we have a list like this: 1 7 8 6 1 1 5 0 I need to find an algorithm such that it pai

I have this vector: using namespace std; vector< pair<short, string> > vec = {}; And I want to find out if exists a pair <a, b> with b == X. I know about std::find from <algorith

I mean to find the kth smallest actual frequency in a Fenwick-Tree in O(k log(n)) time. If my data is: Tree = [1,3,1,10,3] Actual frequency = [1,2,1,6,3] So the second smallest element would be at in

I have an array a = [3,6,774,24,56,2,64,56,34]. I need to find the second largest number in a single iteration using Ruby. How do I achieve it?

I have a massive gzipped tarball with 13000 files. How do I extract only the largest file inside of that from within a Python program? I have tried reading through the tarball and checking the length

I have a simple JavaScript Array object containing a few numbers. [267, 306, 108] Is there a function that would find the largest number in this array?

Given a web page, how do you find the largest rectangle on the webpage which is the main content area? For example, compare the size of sidebar, header, footer, and main content area. Is it possible t

The following code is working as expected. I know the reservation ID 50b4f837 and I could find the instance related to that reservation. >>> reservations = conn.get_all_instances() >>&g

Problem: This is an interview Question. A group of farmers has some elevation data, and we’re going to help them understand how rainfall flows over their farmland. We’ll represent the land as a two-di

I have a file that has 1,000,000 float values in it. I need to find the 10,000 largest values. I was thinking of: Reading the file Converting the strings to floats Placing the floats into a max-heap

I'm little confused with this simple program.I have to find third largest no in array.I have done some code but getting only second largest no problem in third largest no so please suggest me what is

I have been given an array A of integers. Now I have to found out a sub-array(a sub-sequence of original array) where sum of every pair is greater than or equal to a pre-defined K. What I thought :-

I have column in detail band with value $F{thScore}+$F{prScore} I would like to find the sum of this column in run time . How is this possible in jasper report using Ireport. I did it with variable

how can I compute the following from within the Unix terminal and then store the results in a file? 4F8D-AA87-D9EC8805DFDA,3a58538d510c66b98ad7bb3cb9768de08e1ae30b91302add63f7b115 4F8D-AA87-D9EC8805DF

I am trying to understand the evolution of the 100 largest repositories on GitHub. I can easily access the 100 largest repositories as of today (as measured per total number of contributors, stars, fo

I have failed to find solution to following problem: I need to get the largest sum of 'n' consecutive digits in a range of a large number. For example range could be 5^150000, within this range I wan

I have to find the 90-100th largest number from an array when an array is expanding and the data are being populated or added every seconds or less than a seconds. May be considered as a stream of run

I'm trying to do a lab work from the textbook Zelle Python Programming The question asked me to write and test a recursive function max() to find the largest number in a list. The max is the larger o

Okay I have an assignment to find the Kth smallest element in the list using several different methods.... first method is to sort the list, then return the Kth smallest element. easy, my mentality is

I was trying to find the largest and the smallest numbers in one column. The output should have the the largest and the smallest number along with the names of the max and min numbers. The awk code I

How to find 2nd and 3rd largest amount from a table

if we have n lists, we need to select a number from each list, the selected number cannot be selected again, how to make selection to get the largest sum of n selected numbers? e.g. list1: 4 5 7. list

I think I am creating this pair incorrectly because I am getting a segfault when I debug using DDD. Can anyone see where I made a mistake? Thanks! The segfault happens when processing the input file:

Possible Duplicate: How to find the kth largest element in an unsorted array of length n in O(n)? Is there any algorithm of finding the middle value of given list without performing the sorting oper

I need to generate a 64 bit public-private key pair but can't find out any standard algorithm....... How do I do that? Someone please reply asap

I have an array of int (the length of the array can go from 11 to 500) and i need to extract, in another array, the largest ten numbers. So, my starting code could be this: arrayNumbers[n]; //array in

How to find an array of numbers(elements) from array of n numbers whose sum is nearly equal or exactly equal to the number x.? I implemented using recursive. But it takes too much time. Pls help Is th

Given an NxN binary matrix (containing only 0's or 1's), how can we go about finding largest rectangle containing all 0's? Example: I 0 0 0 0 1 0 0 0 1 0 0 1 II->0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0

Given an array of number from 1 to 100, find the sum of the greatest x numbers. Please let me know if you have any solution to this question in java

I have a binary image BW, and I want to find out the largest row number of each column. I know I can do it in a loop, and in each iteration I return max(find(BW(:,i))); as the row number in column i.

I need to find the largest number the user input but I can not define the largest as a number, like I defined largest = -9999999, any suggestions? the clargest is to count how many time the larges

What's a fast algorithm for finding the length of largest monotonically increasing sequence in an array of integers.

This question already has an answer here: How do I compare strings in Java? 23 answers So I'm trying to find the largest product of 2 3-digit numbers that is a palindrome. Here's my code: class

Given an array. How can we find sum of elements in index interval (i, j) in constant time. You are allowed to use extra space. Example: A: 3 2 4 7 1 -2 8 0 -4 2 1 5 6 -1 length = 14 int getsum(int* a

Suppose there is an array of size N (N is always even). Given all the elements of the array form a pair,which gives the same sum when added. Find the sum. This is definitely not homework. For example

I try to use randomized pivot method to find the Kth min elem among given array. [The code] public class FindKthMin { // Find the Kth min elem by randomized pivot. private static void exchange (int[