๐—›๐—ผ๐˜„ ๐—ฑ๐—ผ ๐˜†๐—ผ๐˜‚ ๐—ด๐—ฒ๐˜ ๐—Ÿ๐—Ÿ๐— -๐—พ๐˜‚๐—ฎ๐—น๐—ถ๐˜๐˜† ๐˜€๐—ธ๐—ถ๐—น๐—น ๐—บ๐—ฎ๐—ฝ๐—ฝ๐—ถ๐—ป๐—ด ๐˜„๐—ถ๐˜๐—ต๐—ผ๐˜‚๐˜ ๐—Ÿ๐—Ÿ๐— -๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐˜€๐˜๐˜€?

๐™ ๐™‰๐™‰๐˜ฝ๐™€ is a hybrid model combines the efficiency of bi-encoders with the precision of k-NN lookups for skill mapping. With it, organizations can proficiently associate job descriptions with appropriate skills. By employing labeled synthetic sentences, ๐™ ๐™‰๐™‰๐˜ฝ๐™€ enhances accuracy while maintaining speed, ideal for large-scale projects.

๐—›๐—ผ๐˜„ ๐—ฑ๐—ผ ๐˜†๐—ผ๐˜‚ ๐—ด๐—ฒ๐˜ ๐—Ÿ๐—Ÿ๐— -๐—พ๐˜‚๐—ฎ๐—น๐—ถ๐˜๐˜† ๐˜€๐—ธ๐—ถ๐—น๐—น ๐—บ๐—ฎ๐—ฝ๐—ฝ๐—ถ๐—ป๐—ด ๐˜„๐—ถ๐˜๐—ต๐—ผ๐˜‚๐˜ ๐—Ÿ๐—Ÿ๐— -๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ ๐—ฐ๐—ผ๐˜€๐˜๐˜€? At CIKM2025, Megagon Labs presented ๐™ ๐™‰๐™‰๐˜ฝ๐™€: ๐™„๐™ฃ๐™˜๐™ค๐™ง๐™ฅ๐™ค๐™ง๐™–๐™ฉ๐™ž๐™ฃ๐™œ ๐™‡๐™–๐™—๐™š๐™ก๐™š๐™™ ๐™Ž๐™š๐™ฃ๐™ฉ๐™š๐™ฃ๐™˜๐™š๐™จ ๐™ž๐™ฃ ๐˜ฝ๐™žโ€“๐™š๐™ฃ๐™˜๐™ค๐™™๐™š๐™ง ๐™„๐™ฃ๐™›๐™š๐™ง๐™š๐™ฃ๐™˜๐™š ๐™›๐™ค๐™ง ๐™๐™–๐™จ๐™ฉ ๐™–๐™ฃ๐™™ ๐˜ผ๐™˜๐™˜๐™ช๐™ง๐™–๐™ฉ๐™š ๐™Ž๐™ ๐™ž๐™ก๐™ก ๐™ˆ๐™–๐™ฅ๐™ฅ๐™ž๐™ฃ๐™œ โ€” a hybrid approach to large-scale skill mapping in HR.

In skill mapping, we assign ๐—ผ๐—ป๐˜๐—ผ๐—น๐—ผ๐—ด๐˜†-๐—ฑ๐—ฒ๐—ณ๐—ถ๐—ป๐—ฒ๐—ฑ ๐˜€๐—ธ๐—ถ๐—น๐—น๐˜€ (e.g., ESCO) to job texts. Bi-encoders are fast enough for millions of postings, but struggle with ๐—ณ๐—ถ๐—ป๐—ฒ-๐—ด๐—ฟ๐—ฎ๐—ถ๐—ป๐—ฒ๐—ฑ, ๐—ผ๐˜ƒ๐—ฒ๐—ฟ๐—น๐—ฎ๐—ฝ๐—ฝ๐—ถ๐—ป๐—ด ๐˜€๐—ธ๐—ถ๐—น๐—น๐˜€. Cross-encoders and LLM-based rerankers are more accurate, yet too slow and expensive for production-scale pipelines.

๐—ธ๐—ก๐—ก๐—•๐—˜ ๐—ฏ๐—ฟ๐—ถ๐—ฑ๐—ด๐—ฒ๐˜€ ๐˜๐—ต๐—ถ๐˜€ ๐—ด๐—ฎ๐—ฝ.
It keeps the bi-encoder backbone, but augments its score with a ๐—ธ-๐—ก๐—ก ๐—น๐—ผ๐—ผ๐—ธ๐˜‚๐—ฝ ๐—ผ๐˜ƒ๐—ฒ๐—ฟ ๐—น๐—ฎ๐—ฏ๐—ฒ๐—น๐—ฒ๐—ฑ ๐˜€๐˜†๐—ป๐˜๐—ต๐—ฒ๐˜๐—ถ๐—ฐ ๐˜€๐—ฒ๐—ป๐˜๐—ฒ๐—ป๐—ฐ๐—ฒ๐˜€ stored in a memory bank. At inference time, the model:

  • Encodes the input sentence and candidate skills with a bi-encoder
  • Retrieves the k nearest synthetic sentences (each labeled with a skill)
  • ๐—œ๐—ป๐˜๐—ฒ๐—ฟ๐—ฝ๐—ผ๐—น๐—ฎ๐˜๐—ฒ๐˜€ bi-encoder similarity with similarity to these nearest labeled examples

This lets the system: 

  • Better separate ๐˜ƒ๐—ฒ๐—ฟ๐˜† ๐˜€๐—ถ๐—บ๐—ถ๐—น๐—ฎ๐—ฟ ๐˜€๐—ธ๐—ถ๐—น๐—น๐˜€
  • ๐—š๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐—น๐—ถ๐˜‡๐—ฒ ๐˜๐—ผ ๐˜‚๐—ป๐˜€๐—ฒ๐—ฒ๐—ป ๐˜€๐—ธ๐—ถ๐—น๐—น๐˜€ by updating the memory, not retraining the encoder
  • Stay ๐—ต๐—ถ๐—ด๐—ต-๐˜๐—ต๐—ฟ๐—ผ๐˜‚๐—ด๐—ต๐—ฝ๐˜‚๐˜ ๐—ฎ๐—ป๐—ฑ ๐—ฝ๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐˜๐—ถ๐—ผ๐—ป-๐—ณ๐—ฟ๐—ถ๐—ฒ๐—ป๐—ฑ๐—น๐˜†

On three benchmark datasets built from real job postings, kNNBE rivals ๐˜€๐˜๐—ฎ๐˜๐—ฒ-๐—ผ๐—ณ-๐˜๐—ต๐—ฒ-๐—ฎ๐—ฟ๐˜ ๐—ฟ๐—ฒ๐—ฟ๐—ฎ๐—ป๐—ธ๐—ฒ๐—ฟ๐˜€ ๐—ถ๐—ป ๐—ฎ๐—ฐ๐—ฐ๐˜‚๐—ฟ๐—ฎ๐—ฐ๐˜† while remaining ๐—ผ๐—ฟ๐—ฑ๐—ฒ๐—ฟ๐˜€ ๐—ผ๐—ณ ๐—บ๐—ฎ๐—ด๐—ป๐—ถ๐˜๐˜‚๐—ฑ๐—ฒ ๐—ณ๐—ฎ๐˜€๐˜๐—ฒ๐—ฟ, making it practical for labor-market-scale deployments.

Weโ€™ve also ๐—ผ๐—ฝ๐—ฒ๐—ป-๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ๐—ฑ ๐—ธ๐—ก๐—ก๐—•๐—˜ on the ๐— ๐—ฒ๐—ด๐—ฎ๐—ด๐—ผ๐—ป ๐—Ÿ๐—ฎ๐—ฏ๐˜€ ๐—š๐—ถ๐˜๐—›๐˜‚๐—ฏ, so research scientists and engineers can plug this hybrid memory + bi-encoder idea into their own retrieval and skill-mapping stacks.

If youโ€™re exploring retrieval, representation learning, or HR tech, this is an example of innovation through research with a clear path to real-world, human-centered impact.

Read the paper!

Share this article
1 Min Read
May 14, 2026
Megagon Labsโ€™ ACL 2026 paper explores Tool-Induced Myopia, showing how tool use can improve LLM accuracy while degrading AI reasoning quality.
4 Min Read
April 29, 2026
How Data Representation Shapes Compound AI Systems