Блог пользователя vlchen888

Автор vlchen888, история, 5 лет назад, По-английски

Hey all,

We are excited to invite everyone to the Quora Programming Challenge 2021, a free programming competition open to participants from around the world*! The competition will take place from Saturday, February 6th, 2021 to Sunday, February 7th, 2021, in two divisions of 4 hours each, to accommodate different timezones.

Quora is a platform to ask questions, get useful answers, and share what you know with the world. On this contest, we include some problems inspired by real-world challenges that Quora engineers faced in building and growing the product. In addition to competitive programming-style questions, there will also be some machine learning problems. We hope that this contest will be fun for everyone!

The top contestants in each division will be able to win the following prizes:

  • 1st place: $2000
  • 2nd place: $1000
  • 3rd place: $500
  • 4th — 10th: $200
  • Top 50 finishers: Quora Challenge T-Shirt

Our recruiting team will also be reaching out to top participants for their interest in interviewing for roles at Quora. We recently became a remote-first company, so you don’t have to relocate to work with us.

Register for the competition here by 2/5/2021: https://www.quora.com/careers/challenge.

Hope to see you at the contest!

  • Проголосовать: нравится
  • +127
  • Проголосовать: не нравится

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Auto comment: topic has been updated by vlchen888 (previous revision, new revision, compare).

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +9 Проголосовать: не нравится

Deja vu.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +21 Проголосовать: не нравится

Why is LinkedIn profile a required field?
If it is so much required, why does it accept https://linked.in/none as answer?
Mind that the answer is accurate and truthful to the extent possible, as required in the rules.

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +79 Проголосовать: не нравится

    It's not really surprising that the form doesn't validate that the URL is a real profile URL, they did some rough validation and that's enough; do you expect forms to also automatically check that your City is a "real city" against some magical database before allowing your submission? No one ever does that.

    LinkedIn profile is required because, clearly, Quora is running this (for free, with real prize money!) as a tool to recruit people to work there.

    • »
      »
      »
      5 лет назад, скрыть # ^ |
       
      Проголосовать: нравится 0 Проголосовать: не нравится

      For important stuff, there's a way to validate an address: "We sent you a mail / message / etc. with a code, please input it in the next form to complete the registration process".

      This one looks mandatory (required field in the form) but not enforced (no validation and no specific mention of LinkedIn profile in the rules, only "fill the form accurately and truthfully"). Hence the question, which is more about the company's intent: if it is really required, it can be clearer in the rules or via validation, if not, the required status can be dropped.

      • »
        »
        »
        »
        5 лет назад, скрыть # ^ |
         
        Проголосовать: нравится +11 Проголосовать: не нравится

        Validation only ever really happens during actual account creation; no one ever validates things like that in registration forms.

        I guess the real question is whether they'll disqualify you if you don't have a LinkedIn? I'm not sure if they will, but from the rules about "accurately and truthfully", it sounds like they can, so I'd just make a LinkedIn and leave it mostly empty.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

The problems in the 2014 version of this challenge has a not-strictly-algorithmic challenge, in particular Sorted Set where you have to parallelize a task, which is not common in CP. Are these types of questions fair game under your description of "competitive programming-style questions"?

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +46 Проголосовать: не нравится

    Hi, thanks for the question! This iteration of the challenge will contain problems that fall into two categories.

    1. “Algorithm” problems: These are competitive programming-style questions that have strictly algorithmic solutions. It is possible to achieve a full score on these problems. Most of the problems will fall into this category.
    2. “Machine Learning” problems: These are problems around machine learning concepts. Scoring is problem-dependent, but roughly would be based on some scaled accuracy metric. We do not guarantee that it is possible to achieve a full score on these problems.
»
5 лет назад, скрыть # |
 
Проголосовать: нравится -33 Проголосовать: не нравится

Since I don't really like Quora as a platform (I believe in the idea, but not the current implementation), is there any way for me to register for the contest without logging into a Quora account? I try to minimize my social media accounts, especially those based around ads and data monetization (which is all of them nowadays, except Microsoft Circles).

Additionally, what's the approximate value of a Quora Challenge T-Shirt? I'm trying to decide whether it's worth attending based on the expected value.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +13 Проголосовать: не нравится

Reminder that registration for this contest is closing soon!

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +20 Проголосовать: не нравится

    Am I right that all problems allow partial scoring, and that wrong attempts don't increase penalty? Will there be a limit on a number of submissions per problem?

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +28 Проголосовать: не нравится

    29 January 2021 — "In the week before the contest, you should receive another email from us with the link to the contest platform and instructions on how to log in. We will use the same link for both divisions of the contest, and you will be able to log in to either one (or both) of the divisions."

    I have not received another email ...

    • »
      »
      »
      5 лет назад, скрыть # ^ |
       
      Проголосовать: нравится 0 Проголосовать: не нравится

      Ok I have received an email

      "The Quora Programming Challenge 2021 is almost upon us!

      Here is the login information that you will need to access the contest (please save this information):

      Contest URL: challenge.quora.com ..."

»
5 лет назад, скрыть # |
Rev. 2  
Проголосовать: нравится 0 Проголосовать: не нравится

Missed registration by some time, thought I will be able to register by 5 Feb IST time :(

»
5 лет назад, скрыть # |
Rev. 3  
Проголосовать: нравится +6 Проголосовать: не нравится
From https://challenge.quora.com/rules

I find this interesting for discussion :)

»
5 лет назад, скрыть # |
Rev. 3  
Проголосовать: нравится +6 Проголосовать: не нравится

Any Java Coders out here, I keep getting the error, "Execution failed because the return code was nonzero". I named my file as Example.java and removed the package and also removed the public keyword from my class. But this continues to show up. I tried submitting a solution using CPP and It got accepted. (BTW this is for the practice session.)

EDIT : (I managed to solve it, my file name had to be "example.java" and not "Example.java")

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +33 Проголосовать: не нравится

Will standings page be available during the contest? I don't see one in practice session.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +29 Проголосовать: не нравится

"A Participant choosing to participate in both divisions could win prizes in both divisions. Limit one (1) prize per division per Participant.".

So, is it possible that best 50 in both divisions is exactly the same?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +2 Проголосовать: не нравится

FYI, here are the Installed Python packages quoted from the official website:

  • Keras 2.4.3
  • numpy 1.19.5
  • Keras-Preprocessing 1.1.2
  • lightgbm 3.1.1
  • pandas 1.1.5
  • scikit-learn 0.24.1
  • scipy 1.5.4
  • tensorboard 2.4.1
  • tensorboard-plugin-wit 1.8.0
  • tensorflow 2.4.1
  • tensorflow-estimator 2.4.0
  • torch 1.7.1
  • xgboost 1.3.3
»
5 лет назад, скрыть # |
 
Проголосовать: нравится -8 Проголосовать: не нравится

Is there any possibility of late registration for either division ?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +92 Проголосовать: не нравится

anton would've surely rejected atleast 6/7 problems. :)

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +18 Проголосовать: не нравится

    seeing same type of comments nowadays It feels that anton have redefined competitive programming.

    • »
      »
      »
      5 лет назад, скрыть # ^ |
       
      Проголосовать: нравится +56 Проголосовать: не нравится

      I think this problemset would have been rejected by 2015 coordinators as well.

      • »
        »
        »
        »
        5 лет назад, скрыть # ^ |
        Rev. 4  
        Проголосовать: нравится -18 Проголосовать: не нравится

        why because they were too difficult ? I usually observe that people like kind of easy problem set and dislike difficult one.

        Also notice that format was completely different . It was 4 hrs instead of 2 hrs.

        -is-this-fft- i don't know why i can't reply two times in 10 minutes , So you want to say because problems were kind of popular or well known and thus they were boring ? From "standard" i usually feel well known problems which people solve while learning some topic.

        Was only me who found problemset was only for > master people ?

        • »
          »
          »
          »
          »
          5 лет назад, скрыть # ^ |
           
          Проголосовать: нравится +32 Проголосовать: не нравится

          No, because of problem style. Most of the problems were very standard, even a few years ago. I basically quit the contest out of boredom.

        • »
          »
          »
          »
          »
          5 лет назад, скрыть # ^ |
           
          Проголосовать: нравится +15 Проголосовать: не нравится

          -is-this-fft- i don't know why i can't reply two times in 10 minutes ,

          Probably because of unrated and low contribution.

          So you want to say because problems were kind of popular or well known and thus they were boring ? From "standard" i usually feel well known problems which people solve while learning some topic.

          Yes, that. For example there was a basic "learning König's theorem" problem and a pretty basic "learning HLD" problem. Note that using these techniques to set problems is fine, but not if knowing the thing is basically the whole problem. I also think that it is fine if there is an "educational" problem from time to time, but if all problems are like that then the problemset is bad.

          Was only me who found problemset was only for > master people ?

          Well, not master+, more like expert+. But you're right that this contest wasn't very green-friendly.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

In problem C(Problem ID: students), won't the graph always be bipartite? (Solving with this assumption fetched me 21 points though) or was it an implementation mistake?

21 Pointer CODE
»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Is the wall problem (forgot the exact name) a connected component dp problem? Can't find a valid transition during contest.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +8 Проголосовать: не нравится

How can people apart from top-50 check their ranks?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +10 Проголосовать: не нравится

Did anyone get an issue on the wall task where on test case 7 the code would throw a runtime error? I know others who had the same problem

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится 0 Проголосовать: не нравится

    I didn't end up with an RTE, but I got WA7 on the first subtask and couldn't finish debugging before the end of the round. I'm also looking for a countercase; I've stress tested my solution on thousands of cases against a slow solution I wrote and the correct solutions I've just been sent and can't find a case that I WA. If anyone happens to have gotten WA7/RTE7, found a countercase, and successfully debugged their solution, that case would be very much appreciated!

    • »
      »
      »
      5 лет назад, скрыть # ^ |
       
      Проголосовать: нравится 0 Проголосовать: не нравится

      Can you show your code?

      • »
        »
        »
        »
        5 лет назад, скрыть # ^ |
         
        Проголосовать: нравится 0 Проголосовать: не нравится

        Sure, here's a link: https://pastebin.com/pRjgYdNH.

        Essentially, I use a state that, after all vertical walls prior to a given column and all horizontal walls in that column have been processed, indicates the connectivity structure of the current column and which connected components contain workers. There are then 22 states, though I track them as values from 0 to 47 to make keeping track of the underlying values easier--essentially, for a given state S, column 0 is in connected component 0, S % 2 is the connected component containing column 1, (S / 2) % 3 is the connected component containing column 2, and S / 6 is a bitmask representing which components contain workers, and 32 possible transitions at each column (as there are five walls that could be removed in each transition). I precalculate all possible transitions, after which my solution is a fairly straightforward DP approach.

        • »
          »
          »
          »
          »
          5 лет назад, скрыть # ^ |
           
          Проголосовать: нравится +13 Проголосовать: не нравится

          I ran a stress-test against your solution compiled with sanitizers, and what I found is that your code fails with RE when $$$n$$$ is $$$1$$$. The problem is that in line 150 you allocate an array of size $$$N-1$$$, and it seems to be not legal (maybe it is undefined behaviour, I am not sure): I get an error "variable length array evaluates to non-positive value 0".

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +7 Проголосовать: не нравится

Will tasks be available to submit somewhere later?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

what was test case 7 of problem escape. it was getting stuck.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +24 Проголосовать: не нравится

How to solve flipped? My 93 points code: CODE

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +24 Проголосовать: не нравится

    I got AC by assuming each column originally followed a normal distribution with its post-flip mean and standard deviation and picking the rows where the product of the normal PMF corresponding to each value assuming the row is not flipped divided by the product of the normal PMF corresponding to each value assuming the row is flipped is minimized. Intuitively, I assumed that the flips didn't substantially change the overall distribution of each column and then picked the rows where flipping most drastically increases the probability that the values in the row appear in their corresponding distributions.

    Here's my code (excluding my template): https://pastebin.com/YtYNwPh5

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +24 Проголосовать: не нравится

    I computed the mean and standard deviation of each column. Then, for each row $$$r$$$, let $$$f(r)$$$ denote the sum of squared distance from 0.5 of all the value of the cumulative normal distribition on each value in the row (drawn from the estimated normal distribution for each column). Do the same for reversed row $$$r'$$$ and finally sort all rows by the value $$$f(r')-f(r)$$$ and output the first $$$b$$$ rows.

»
5 лет назад, скрыть # |
Rev. 2  
Проголосовать: нравится +37 Проголосовать: не нравится

Seems like the problemset is not (yet) publicly available. I uploaded the problemset to my server for public consumption (so those who missed the registration can also follow the discussion).

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +7 Проголосовать: не нравится

Does anyone know the ML techniques that should be used for the last 2 problems?

Also, I was getting RTE for 3 cases in problem Flipped with Cpp. I switched to Java and I was able to pass those cases. I suspect that I had to use the extra memory limit.

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +13 Проголосовать: не нравится

    For the second last problem, we want to select the reversed rows. A cell is misplaced if its value deviates from the mean of the column. We do want to normalise the deviation by dividing with the standard deviation of the column. The rows with many very misplaced cells are likely reversed.

    We can reverse the b worst rows, and the result was already quite good (around 70?).

    The issue with reversing many rows at once is that we might reverse the wrong row. One way is to reverse one row at a time, and recalculate the mean and standard deviation. But this is too slow.

    The middle ground is to reverse a portion of the b rows, in maybe 16 portions. I scored 99 points (even though it displayed 100 on the console).

    The "ML technique" here is statistical knowledge of mean and standard deviation.

    Code: https://github.com/tonghuikang/codecomp/blob/quora-2021-1/template/f.py

    For the last problem, I used LightGBM (which is one the packages Quora allows, and is faster than XGBoost).

    The challenge is feature engineering. Each data should have the same set of features, but what we see in the data is that each user can visit multiple sites. What I did is to cycle through the visits — each user will revisit the first site again. For the time, I calculated the time difference, and cycle. Each data will have 200+200 features.

    I was prepared to handle more feature engineering and fine-tune the threshold, but I got full marks after a few tries.

    The "ML technique" feature engineering, and trust the black box GBDT model and pray.

    Code: https://github.com/tonghuikang/codecomp/blob/quora-2021-1/template/spam.ipynb

    • »
      »
      »
      5 лет назад, скрыть # ^ |
       
      Проголосовать: нравится +3 Проголосовать: не нравится

      For the last problem, I think trying to do feature engineering is unnecessary. They tell you the exact model used to generate the data (a Markov chain for each user type), and there are pretty much no hidden variables, so you can directly estimate the parameters of the Markov chain using the input data, and then directly compute the Bayesian probabilities of the user types given the test inputs.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

For problem Walls, how can we use the size of the grid to our advantage, or is there any algorithm to find the minimum spanning tree of a subgraph?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +55 Проголосовать: не нравится

Will there be editorial?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +3 Проголосовать: не нравится

Sorry for Asking, but can anyone help me with problem B " Escape ". I could not find the idea how to mark all the cell visible to atleast 1 guard any help.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +1 Проголосовать: не нравится

How to solve problem "Escape". My approach was to first block all the cells which are at the distance <= d from guard. Then I find the length of the shortest path using bfs. It runs on subtask 1 only. Code

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +1 Проголосовать: не нравится

    Hey, I used multisource BFS, pushed the coordinate of every guard in the queue with their distance, and mark the cell visited if a guard can reach the cell. Now, if our source is visited means at least one guard can reach our initial position and we can't go any further, so "IMPOSSIBLE". If our destination is visited means we can't go to destination, so "IMPOSSIBLE" and if all the neighbors of destination are visited then there is no chance to go to destination cell so the answer is "IMPOSSIBLE". If none of the conditions is satisfied above then apply simple BFS from source to destination. If we can reach the destination cell then print the distance else "IMPOSSIBLE".

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Division 1 recently concluded! Congratulations to all the winners and really happy to see all of the interesting discussion here. We hope you enjoyed the Division 1 problems and hope to see you at Division 2, which is starting in less than 6 hours: Feb 7 2021, 01:00 to 05:00 (UTC + 00:00)

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Can someone provide me the final ranklist link?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится -15 Проголосовать: не нравится

How tf ML problems are ML?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Does anyone happen to have the sample I/O for Spam downloaded, and if so, would you mind posting it publicly? I wrote a solution to the problem after the contest ended, and I'm curious about how it would perform on a larger test case.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +3 Проголосовать: не нравится

Any link for tutorial?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Can anybody tell the efficient solution for escape problem for full points ? None of the comments in this blog mention the efficient solution .

I think Multi-source BFS would give TLE as well , because you will visit a cell say (x,y) which is being gaurd by multiple enemies , so you will have to mark it multiple times which will waste time , and you cant ignore to mark it multiple times as well as each enemy may have different 'd' .

Please someone throw light on the efficient solution for escape ?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +10 Проголосовать: не нравится

Very surprised that https://en.wikipedia.org/wiki/Strong_connectivity_augmentation is not given as a CP problem before, or maybe my Google-fu skill is bad.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +2 Проголосовать: не нравится

https://cses.fi/problemset/task/1685 The exact same problem.. This is a joke :/
The quora one got me wrong answer on TC 6.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +25 Проголосовать: не нравится

Finally tourist and I have something in common — we are the only ones in the top 100 who completely solve malicious before the last hour.

https://www.quora.com/q/quoraprogrammingchallenge/Division-2-One-Hour-Remaining

Code: https://github.com/tonghuikang/codecomp/blob/quora-2021-2/template/malicious.ipynb

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

Where can the list of all Div1 problems be found ?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

A chance of editorials for the divisions in this contest?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится 0 Проголосовать: не нравится

By any chance if anyone who has downloaded the problemset for Div2 can share it here?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +4 Проголосовать: не нравится

Someone pls share the implementation of Escape and Students problem of DIV-1 Thanks in advance:)

  • »
    »
    5 лет назад, скрыть # ^ |
    Rev. 4  
    Проголосовать: нравится +2 Проголосовать: не нравится

    Here is my code for Problem Escape (excluding my template):

    AC_CODE

    EDIT: It seems the test-cases for this problems were weak, thanks to quinoa for giving a counter case for the above solution. The following solution seems correct to us now, though confirmation of the same is appreciated.

    REVISED_CODE
    • »
      »
      »
      5 лет назад, скрыть # ^ |
      Rev. 4  
      Проголосовать: нравится +2 Проголосовать: не нравится

      Did you get AC with this? I don't think it's correct since you're not prioritizing the nodes with greatest remaining distance when popping them from the queue. For instance it will fail on this testcase. Your code outputs "2" whereas it should say "IMPOSSIBLE".

      5 10 2
      ##########
      #S.......#
      #........#
      #E.......#
      ##########
      3 4 1
      3 9 10
      
      

      Whereas when removing the first guard

      5 10 1
      ##########
      #S.......#
      #........#
      #E.......#
      ##########
      3 9 10
      

      your code gives "IMPOSSIBLE" (which is correct).

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +3 Проголосовать: не нравится

How do we solve datacenter and skyscraper(div-2) optimally?

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +6 Проголосовать: не нравится

    Datacenter: transform each coordinate (x, y) into (x + y, x — y). The problem is now equivalent to finding the datacenter i that minimizes

    $$$\sum_{j = 1}^n |x_i - x_j| + |y_i - y_j|$$$

    which is much nicer to handle: precompute some prefix / suffix sum and iterate through each possible datacenter as the choice.

    Skyscraper: maintain a segment tree where each node contains the maximum height (and possibly the minimum height, not sure if this is needed). The goal is, for a skyscraper i, to quickly find the minimum and maximum indices L and R such that all skyscrapers within [L; R] have heights no more than that of i. You can adapt the segment tree to quickly find these indices in O(log N) time.

    • »
      »
      »
      5 лет назад, скрыть # ^ |
       
      Проголосовать: нравится +6 Проголосовать: не нравится

      I don't get the solution for Data Center. Can you please elaborate why it is equivalent?

      • »
        »
        »
        »
        5 лет назад, скрыть # ^ |
         
        Проголосовать: нравится 0 Проголосовать: не нравится

        We need to compute the distance of two points $$$(x_1, y_1)$$$ and $$$(x_2, y_2)$$$ under the original and transformed formula. By suitable translation and reflection, we can shift these two points to $$$(0, 0)$$$ and $$$(x, y)$$$, and furthermore we can assume $$$x \ge y \ge 0$$$ (the formula is similar for all other cases).

        Under the original formula, the distance of two points is $$$\max (x, y) = x$$$.

        Under the transformed formula, the distance of two transformed points $$$(0, 0)$$$ and $$$(x + y, x - y)$$$ are $$$(x + y) + (x - y) = 2x$$$. So the new distance under the transformed formula is just double the old distance.

    • »
      »
      »
      5 лет назад, скрыть # ^ |
       
      Проголосовать: нравится 0 Проголосовать: не нравится

      How do you adapt segment tree for this problem?

»
5 лет назад, скрыть # |
 
Проголосовать: нравится -6 Проголосовать: не нравится

Every time there is an asterisk after "world", I automatically assume it excludes Iran. What a shame :(

»
5 лет назад, скрыть # |
Rev. 2  
Проголосовать: нравится +10 Проголосовать: не нравится

How to solve div2 message for any decent score? The English prediction problem.

I know that the dataset is generated from actual English phrases, so I've tried different greedy techniques of picking words based on some brewed up weighted score (frequency, position of the missing word, distance to other words of the same phrase, etc.). But I wasn't able to get anywhere 30+

  • »
    »
    5 лет назад, скрыть # ^ |
     
    Проголосовать: нравится +10 Проголосовать: не нравится

    We were tasked to predict the missing word in the sentence.

    When I want to predict the missing word b in a triplet a,b,c — I consider for every possible b, the frequency of (a,b), the frequency of (b,c) and the freqency of (a,b,c). The candidate word for b that appears the most frequently will be my prediction.

    It makes sense if we give greater weights to (a,b,c) because it is less likely to appear compared to (a,b) or (b,c). I count the freqency every appearance of (a,b,c) 10 times rather than once that I did for (a,b) and (b,c). This scored around 47/100 points.

    To get 78/100 points, I considered two words before and two words after the missing word. In the final minutes of the contest, I was fine-tuning the weights and submitted the code repeatedly.

    Code: https://github.com/tonghuikang/codecomp/blob/quora-2021-2/template/f.py

    It is also possible to consider up to 30 characters before and after for a slightly higher score at a very little increase in complexity, but my code was not structured to do that.

    For the full score, I think we need to use a neural network model. The general problem here is masked language encoding.

    https://keras.io/examples/nlp/masked_language_modeling/

    There are readily available templates out there, but it needs to download billon-parameter models which is not allowed by the online judge. We need to build (or copy) a downsized model that is suitable for the dataset.

  • »
    »
    5 лет назад, скрыть # ^ |
    Rev. 2  
    Проголосовать: нравится +10 Проголосовать: не нравится

    I think I got lucky when hitting high score pretty quickly.

    First I got 70 pts by keeping track of a frequency table of all 2-gram (an ordered pair of word (x, y)), and for each possible candidate w, its score is freq (p, w) * freq (w, r) where p and r are the word before and after w, respectively (if there's no such word p/r or the pair doesn't exist in the frequency table, assign the weight of 1 for that pair).

    Return the word of highest score.

    I got 82 by improving the above idea with including the 3-grams.

»
5 лет назад, скрыть # |
 
Проголосовать: нравится +40 Проголосовать: не нравится

Got my T-shirt (Singapore)

Applied and got an offer, and I have accepted the offer ... Yay