1. We have n different elements from which we randomly choose two of its subsets S and T of size m. a) What is going to be the expected value of the size of the intersection of S and T? b) What is going to be the expected value of the Jaccard similarity of the two sets? c) What can we do with the above result? 2. Verify that if two sets has an intersection of size m, and a symmetric difference of size n, then permuting the rows of their characteristic matrix in all the possible ways and checking the agreement of their minhash function values would exactly match the Jaccard similarity of the two sets! 3. We are given the below characteristic matrix and the hash functions h1=(5x+2) mod 6 and h2=(2x+1) mod 6. Item| S1| S2| S3| S4 0 | 0 | 1 | 0 | 1 1 | 0 | 1 | 0 | 0 2 | 1 | 0 | 0 | 1 3 | 0 | 0 | 1 | 0 4 | 0 | 0 | 1 | 1 5 | 1 | 0 | 0 | 0 a) How would the minhash signatures of the sets look like? b) Which hash function seems to be of better use? c) What is the true and the minhash signatures-based approximation of the Jaccard similarity (distance) between set S1 and S4? 4. Implement the various distances discussed on the lecture, and use them to determine the mean and the deviation of the pairwise distances from the outliers.mat dataset! Try to use vectorized implementation! You can also use the code in tavolsagVaz.m (which assumes you are not using vectorization)! You can verify the correctness of your implementation by comparing your output to these distances (corresponding to [d(1,1), d(1,2), d(1,3), d(1,4), d(1,5)]): L1->[0.00000 1.33959 2.26113 3.30099 1.22819] L2->[0.00000 1.04263 1.59907 2.35431 0.87659] Linf->[0.00000 0.97788 1.14910 1.86789 0.69830] cos*-> [0.000000 0.047843 0.005802 0.020406 0.044022] mah.->[0.00000 0.76628 1.28487 1.88149 0.62445] *:(arccos() of cosine similarity was used here) Homework - Finish exercise 4 - Create a variable C in which you store 10 temperature measurements in Celsius degree! Also create a variable F containing the same measurements but expressed in rounded Fahrenheit degrees (F=5/9*C+32)! Then create the data matrix M=[C' F'] and visualize it in the 2D space. Add two more noisy data points (i.e. data points for which the Celsius->Fahrenheit conversion does not apply) to the data set. Try out calculating the distance matrix with the distance functions you implemented and discuss the results obtained.