Wednesday, July 8, 2020

Uber Interview Questions Weighted Random Numbers

Uber Interview Questions Weighted Random Numbers This week, were going to discuss a recent Uber interview question weighted random numbers. This is also one of my favorite questions to ask and Im surprised at how many people failed this problem. In a nutshell, well talk about a couple topics in this post, including sampling, basic data structures, time/space complexity and tips for coding this question. Question Write a function that returns values randomly, according to their weight. Let me give you an example. Suppose we have 3 elements with their weights: A (1), B (1) and C (2). The function should return A with probability 25%, B with 25% and C with 50% based on the weights. The answer is not obvious, but its not too hard to think. Also, writing bug-free code would fail majority candidates. This is really the perfect question for coding interviews. If you havent solved this problem before, Ill give you 20min for both thinking and coding. Please try to solve it before reading the analysis below. Analysis The problem seems quite simple at first glance, but converting your thought into the algorithm is hard. When you get stuck, a practical tip is to try with simple examples, which works pretty well for this type of question. Lets use the same example above. If we do a random sample regardless of the weight, well end up give all of them the probability of 1/3. As wed like to double the chance of selecting element C, an intuitive approach is to have two elements C in the “bag” and do the sampling again. In other words, we have {A, B, C, C} in the set now and we can again do the random sample from it. With that in mind, a naive solution is to duplicate each element with the time of its weight and then do the random sampling from the new set. There are two potential problems here, maybe you can think about it first. A more general approach Two problems of the above solution: This wont work when the weight is not an integer. If the weight is large, it uses too much memory as it stores the same element for many times. Lets think about a more general approach. In essence, we can denote the new set above as a horizontal line like this: The sampling is like randomly select a point and see which area it falls into. Going into this idea, we can have the following algorithm: W is the sum of all the weights (length of the horizontal line) Get a random number R from [0, W] (randomly select a point) Go over each element in order and keep the sum of weights of visited elements. Once the sum is larger than R, return the current element. This is finding which area includes the point. Complexity optimization The time/space complexity should be extremely straightforward. Since we are not using extra memory, the space complexity is O(1). Time complexity is O(N), because we have to iterate over all the elements in both step 1 and 3. Its worth to note that even if we can put step 1 in preprocessing, we still have to do the traversal in step 3. Any way to optimize this? If making step 3 to O(1) is hard, lets try make it O(logN). The first thing we should consider is binary search. If we keep an array that each element is the accumulative sum of weights of all the previous elements, we can do a binary search to locate the element. This improves the algorithm to O(logN) (excluding the preprocessing). If we want to make it O(1), we may need to add some constraints. As weve been mentioning for so many times, the tradeoff between time and space complexity provides a lot of hints about optimization. Obviously, wed like to speed it up and we can think about how to use more memories here. The most time-consuming part is at step 3 and we need to have an approach of O(1) to get the corresponding area. Lets say if we preprocessing all the elements once by keeping an array M where M[i] stores the element corresponding to point i. This allows us to return the area immediately from the array. In essence, this approach is like storing results for every single input in a hashmap and it has space complexity O[W] where W is the sum of all the weights. Tips for coding It should be very straightforward to finish the code. I would suggest everyone take this as a practice. Here is a couple of tips for this question: Be careful about the edge case. When comparing the current sum to the random number, should you use “”, “=” or it makes no difference? If you could define a class, itll be better to have the preprocessing code that calculates the sum of all the weights. An extra question, how do you test your function? You can leave your comment here. Summary A surprising fact is that most interview questions are easier than people expected ,however, whats more surprising is that so many people failed in those “simple” questions. The weighted random numbers question is a great example. No one thinks its hard, but few can actually solve it perfectly. In fact, this is the type of questions people should spend the most time on. By the way, if you want to have more guidance from experienced interviewers, you can check Gainlo that allows you to have mock interview with engineers from Google, Facebook ,etc..

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.