# References ## Hugging Face - Hugging Face Tutorial: https://huggingface.co/docs/transformers/tasks/question_answering - Hugging Face on Amazon SageMaker: https://huggingface.co/docs/sagemaker/main - Hugging Face examples: https://github.com/huggingface/notebooks/tree/master/sagemaker ## Downstream tasks ### Multiclass Classification - KoELECTRA: https://github.com/monologg/KoELECTRA - Naver Sentiment Movie Corpus v1.0: https://github.com/e9t/nsmc ### Named Entity Recognition (NER) - 네이버, 창원대가 함께하는 NLP Challenge GitHub: https://github.com/naver/nlp-challenge - 네이버, 창원대가 함께하는 NLP Challenge 리더보드 및 라이센스: http://air.changwon.ac.kr/?page_id=10 ### Question Answering - KorQuAD 1.0: https://korquad.github.io/KorQuad%201.0/ ### Chatbot and Semantic Search using Sentence-BERT (SBERT) - Sentence-BERT: https://arxiv.org/abs/1908.10084 - SentenceTransformers: https://www.sbert.net - Chatbot dataset: https://github.com/songys/Chatbot_data - Billion-scale similarity search with GPUs: https://arxiv.org/pdf/1702.08734.pdf - Product Quantizers for k-NN Tutorial Part 1: https://mccormickml.com/2017/10/13/product-quantizer-tutorial-part-1 - Product Quantizers for k-NN Tutorial Part 2: http://mccormickml.com/2017/10/22/product-quantizer-tutorial-part-2 - Billion-scale semantic similarity search with FAISS+SBERT: https://towardsdatascience.com/billion-scale-semantic-similarity-search-with-faiss-sbert-c845614962e2 - Korean Contemporary Corpus of Written Sentences: http://nlp.kookmin.ac.kr/kcc/ ### Natural Language Inference (NLI) - KorNLI datasets: https://github.com/kakaobrain/KorNLUDatasets/tree/master/KorNLI - KLUE: https://github.com/KLUE-benchmark/KLUE ### Summarization - Naver News Summarization Dataset: https://huggingface.co/datasets/daekeun-ml/naver-news-summarization-ko - Summarization fine-tuning: https://huggingface.co/docs/transformers/tasks/summarization - KoBART: https://github.com/SKT-AI/KoBART - ROUGE (Recall-Oriented Understudy for Gisting Evaluation) metric: https://aclanthology.org/W04-1013.pdf ### Translation - Translation fine-tuning: https://huggingface.co/docs/transformers/tasks/translation - KDE4 dataset: https://huggingface.co/datasets/kde4 - Related paper: http://www.lrec-conf.org/proceedings/lrec2012/pdf/463_Paper.pdf - ScareBLEU (Bilingual Evaluation Understudy) metric: https://github.com/mjpost/sacreBLEU ### TrOCR - TrOCR: https://arxiv.org/pdf/2109.10282.pdf - TextRecognitionDataGenerator: https://github.com/Belval/TextRecognitionDataGenerator - Naver News Summarization Dataset: https://huggingface.co/datasets/daekeun-ml/naver-news-summarization-ko - Naver Sentiment Movie Corpus v1.0: https://github.com/e9t/nsmc - Chatbot dataset: https://github.com/songys/Chatbot_data - Kiwi Python wrapper: https://github.com/bab2min/kiwipiepy