{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Text generation with Pretrained GPT2 models from Hugging Face on Amazon SageMaker\n",
"## The Poetry of NLP\n",
"\n",
"You’ve just been hired by the Chicago Tribune to start a new poetry column. Congrats! The catch? You need to write a new poem every day. And it can’t just be any old string of syllables, you need it to be fresh, authentic, to resonate with the times and carry a sense of rhyme. You need it to delight your readers, to drive up the Tribune’s daily readership numbers and grow their social media presence. How are you going to accomplish this? With the help of Hugging Face and NLP models on SageMaker of course! "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### In this notebook, we'll execute the following steps.\n",
"\n",
"1. Use the Hugging Face transformfers SDK to download pretrained NLP models and test them locally.\n",
"2. Select a dataset from among our favorite authors.\n",
"3. Finetune the pretrained model using SageMaker training.\n",
"4. Deploy the model into S3.\n",
"5. Trigger a pipeline to test and deploy the model onto a multi-container endpoint.\n",
"6. Test your multi-model endpoint locally to write poetry and text in the style of your favorite authors. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Please note, this notebook was built on SageMaker Studio, using an ml.t3.medium kernel gatway application, and the Python 3.6 PyTorch 1.8 CPU Jupyter Kernel."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 0. Install the transformers SDK locally."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%%writefile requirements.txt \n",
"\n",
"transformers==4.6.1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install -r requirements.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 1. Download a pretrained GPT2 model and test locally.\n",
"We're using the Transformers SDK syntax available on the model card here: https://huggingface.co/gpt2 \n",
"\n",
"To make this model even better, we'll use a version of GPT2 that **has already been finetuned to generate poetry!**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import AutoTokenizer, AutoModelForCausalLM\n",
"\n",
"poem_gpt = \"ismaelfaro/gpt2-poems.en\"\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(poem_gpt)\n",
"\n",
"base_model = AutoModelForCausalLM.from_pretrained(poem_gpt)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import set_seed\n",
"\n",
"def get_outputs(sample_outputs, tokenizer):\n",
" # takes a tokenizer, and raw output from the model, decodes these and formats nicely\n",
" rt = []\n",
"\n",
" print(\"Output:\\n\" + 100 * '-')\n",
" for i, sample_output in enumerate(sample_outputs):\n",
" txt = tokenizer.decode(sample_output, skip_special_tokens = True)\n",
" print(\"{}: {}...\".format(i, txt))\n",
" print('')\n",
" rt.append(txt)\n",
" \n",
" return rt\n",
"\n",
"# setting the seed helps us ensure reproducibility. when the seed is consistent, we know the model results will be consistent\n",
"set_seed(42)\n",
"\n",
"text = \"A rose by any other name\"\n",
"\n",
"input_ids = tokenizer.encode(text, return_tensors = 'pt')\n",
"\n",
"sample_outputs = base_model.generate(input_ids,\n",
" do_sample = True, \n",
" max_length = 70,\n",
" num_return_sequences = 5) \n",
"\n",
"generic_outputs = get_outputs(sample_outputs, tokenizer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interesting and entertaining! Clearly this model knows the form of poetry. It is obviously generating short lines, with a newline, and it seems to pick up some interesting concepts. Now, let's see if we can fine-tune this poem writer to fit the style of an author we have in mind."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2. Fine-tune the GPT2 Poem model with Anne Bradstreet.\n",
"Now, we're going to fine-tune this model using another, much smaller, dataset. Then later we'll use a text classifier trained to evaluate this style of writer, and see how well our new text performs!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you're curious, take a look at some of the top authors in the English language available through this open domain site.\n",
"https://www.public-domain-poetry.com/topauthors.php \n",
"\n",
"For the purposes of this workshop we'll stick to the longer poem pasted below. On your time time, outside of the workshop, if you'd like to modify this to work with a different text you are welcome to do so.\n",
"\n",
"Poke around at some of the available poems, and copy and paste what you like into this `train.txt` file below. We'll format that for finetuning GPT2 in the next step. In this notebook we're using a poem from Anne Bradstreet, a North American writer from the 17th Century.\n",
"\n",
"You may not have known this, but Anne Bradstreet was the first writer to be published in the North America!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"jupyter": {
"source_hidden": true
}
},
"outputs": [],
"source": [
"%%writefile train.txt\n",
"\n",
"A Dialogue Between Old England And New\n",
"\n",
" By Anne Bradstreet\n",
"\n",
" New England.\n",
"\n",
" Alas, dear Mother, fairest Queen and best,\n",
" With honour, wealth, and peace happy and blest,\n",
" What ails thee hang thy head, and cross thine arms,\n",
" And sit i� the dust to sigh these sad alarms?\n",
" What deluge of new woes thus over-whelm\n",
" The glories of thy ever famous Realm?\n",
" What means this wailing tone, this mournful guise?\n",
" Ah, tell thy Daughter; she may sympathize.\n",
"\n",
" Old England.\n",
"\n",
" Art ignorant indeed of these my woes,\n",
" Or must my forced tongue these griefs disclose,\n",
" And must my self dissect my tatter�d state,\n",
" Which Amazed Christendom stands wondering at?\n",
" And thou a child, a Limb, and dost not feel\n",
" My weak�ned fainting body now to reel?\n",
" This physic-purging-potion I have taken\n",
" Will bring Consumption or an Ague quaking,\n",
" Unless some Cordial thou fetch from high,\n",
" Which present help may ease my malady.\n",
" If I decease, dost think thou shalt survive?\n",
" Or by my wasting state dost think to thrive?\n",
" Then weigh our case, if �t be not justly sad.\n",
" Let me lament alone, while thou art glad.\n",
"\n",
" New England.\n",
"\n",
" And thus, alas, your state you much deplore\n",
" In general terms, but will not say wherefore.\n",
" What Medicine shall I seek to cure this woe,\n",
" If th� wound�s so dangerous, I may not know?\n",
" But you, perhaps, would have me guess it out.\n",
" What, hath some Hengist like that Saxon stout\n",
" By fraud and force usurp�d thy flow�ring crown,\n",
" Or by tempestuous Wars thy fields trod down?\n",
" Or hath Canutus, that brave valiant Dane,\n",
" The regal peaceful Sceptre from thee ta�en?\n",
" Or is �t a Norman whose victorious hand\n",
" With English blood bedews thy conquered Land?\n",
" Or is �t intestine Wars that thus offend?\n",
" Do Maud and Stephen for the Crown contend?\n",
" Do Barons rise and side against their King,\n",
" And call in Foreign aid to help the thing?\n",
" Must Edward be depos�d? Or is �t the hour\n",
" That second Richard must be clapp�d i� th� Tower?\n",
" Or is it the fatal jar, again begun,\n",
" That from the red, white pricking Roses sprung?\n",
" Must Richmond�s aid the Nobles now implore\n",
" To come and break the tushes of the Boar?\n",
" If none of these, dear Mother, what�s your woe?\n",
" Pray, do not fear Spain�s bragging Armado.\n",
" Doth your Ally, fair France, conspire your wrack,\n",
" Or doth the Scots play false behind your back?\n",
" Doth Holland quit you ill for all your love?\n",
" Whence is this storm, from Earth or Heaven above?\n",
" Is �t drought, is �t Famine, or is �t Pestilence?\n",
" Dost feel the smart, or fear the consequence?\n",
" Your humble Child entreats you shew your grief.\n",
" Though Arms nor Purse she hath for your relief�\n",
" Such is her poverty,�yet shall be found\n",
" A suppliant for your help, as she is bound.\n",
"\n",
" Old England.\n",
"\n",
" I must confess some of those Sores you name\n",
" My beauteous Body at this present maim,\n",
" But foreign Foe nor feigned friend I fear,\n",
" For they have work enough, thou knowest, elsewhere.\n",
" Nor is it Alcie�s son and Henry�s Daughter\n",
" Whose proud contention cause this slaughter;\n",
" Nor Nobles siding to make John no King,\n",
" French Louis unjustly to the Crown to bring;\n",
" No Edward, Richard, to lose rule and life,\n",
" Nor no Lancastrians to renew old strife;\n",
" No Crook-backt Tyrant now usurps the Seat,\n",
" Whose tearing tusks did wound, and kill, and threat.\n",
" No Duke of York nor Earl of March to soil\n",
" Their hands in Kindred�s blood whom they did foil;\n",
" No need of Tudor Roses to unite:\n",
" None knows which is the Red or which the White.\n",
" Spain�s braving Fleet a second time is sunk.\n",
" France knows how of my fury she hath drunk\n",
" By Edward third and Henry fifth of fame;\n",
" Her Lilies in my Arms avouch the same.\n",
" My Sister Scotland hurts me now no more,\n",
" Though she hath been injurious heretofore.\n",
" What Holland is, I am in some suspense,\n",
" But trust not much unto his Excellence.\n",
" For wants, sure some I feel, but more I fear;\n",
" And for the Pestilence, who knows how near?\n",
" Famine and Plague, two sisters of the Sword,\n",
" Destruction to a Land doth soon afford.\n",
" They�re for my punishments ordain�d on high,\n",
" Unless thy tears prevent it speedily.\n",
" But yet I answer not what you demand\n",
" To shew the grievance of my troubled Land.\n",
" Before I tell the effect I�ll shew the cause,\n",
" Which are my sins�the breach of sacred Laws:\n",
" Idolatry, supplanter of a N ation,\n",
" With foolish superstitious adoration,\n",
" Are lik�d and countenanc�d by men of might,\n",
" The Gospel is trod down and hath no right.\n",
" Church Offices are sold and bought for gain\n",
" That Pope had hope to find Rome here again.\n",
" For Oaths and Blasphemies did ever ear\n",
" From Beelzebub himself such language hear?\n",
" What scorning of the Saints of the most high!\n",
" What injuries did daily on them lie!\n",
" What false reports, what nick-names did they take,\n",
" Not for their own, but for their Master�s sake!\n",
" And thou, poor soul, wast jeer�d among the rest;\n",
" Thy flying for the Truth I made a jest.\n",
" For Sabbath-breaking and for Drunkenness\n",
" Did ever Land profaneness more express?\n",
" From crying bloods yet cleansed am not I,\n",
" Martyrs and others dying causelessly.\n",
" How many Princely heads on blocks laid down\n",
" For nought but title to a fading Crown!\n",
" �Mongst all the cruelties which I have done,\n",
" Oh, Edward�s Babes, and Clarence�s hapless Son,\n",
" O Jane, why didst thou die in flow�ring prime?�\n",
" Because of Royal Stem, that was thy crime.\n",
" For Bribery, Adultery, for Thefts, and Lies\n",
" Where is the Nation I can�t paralyze?\n",
" With Usury, Extortion, and Oppression,\n",
" These be the Hydras of my stout transgression;\n",
" These be the bitter fountains, heads, and roots\n",
" Whence flow�d the source, the sprigs, the boughs, and fruits.\n",
" Of more than thou canst hear or I relate,\n",
" That with high hand I still did perpetrate,\n",
" For these were threat�ned the woeful day\n",
" I mocked the Preachers, put it fair away.\n",
" The Sermons yet upon record do stand\n",
" That cried destruction to my wicked Land.\n",
" These Prophets� mouths (all the while) was stopt,\n",
" Unworthily, some backs whipt, and ears crept;\n",
" Their reverent cheeks bear the glorious marks\n",
" Of stinking, stigmatizing Romish Clerks;\n",
" Some lost their livings, some in prison pent,\n",
" Some grossly fined, from friends to exile went:\n",
" Their silent tongues to heaven did vengeance cry,\n",
" Who heard their cause, and wrongs judg�d righteously,\n",
" And will repay it sevenfold in my lap.\n",
" This is fore-runner of my after-clap.\n",
" Nor took I warning by my neighbors� falls.\n",
" I saw sad Germany�s dismantled walls,\n",
" I saw her people famish�d, Nobles slain,\n",
" Her fruitful land a barren heath remain.\n",
" I saw (unmov�d) her Armies foil�d and fled,\n",
" Wives forc�d, babes toss�d, her houses calcined.\n",
" I saw strong Rochelle yield�d to her foe,\n",
" Thousands of starved Christians there also.\n",
" I saw poor Ireland bleeding out her last,\n",
" Such cruelty as all reports have past.\n",
" Mine heart obdurate stood not yet aghast.\n",
" Now sip I of that cup, and just �t may be\n",
" The bottom dregs reserved are for me.\n",
"\n",
" New England.\n",
"\n",
" To all you�ve said, sad mother, I assent.\n",
" Your fearful sins great cause there �s to lament.\n",
" My guilty hands (in part) hold up with you,\n",
" A sharer in your punishment�s my due.\n",
" But all you say amounts to this effect,\n",
" Not what you feel, but what you do expect.\n",
" Pray, in plain terms, what is your present grief?\n",
" Then let�s join heads and hands for your relief.\n",
"\n",
" Old England.\n",
"\n",
" Well, to the matter, then. There�s grown of late\n",
" �Twixt King and Peers a question of state:\n",
" Which is the chief, the law, or else the King?\n",
" One saith, it�s he; the other, no such thing.\n",
" My better part in Court of Parliament\n",
" To ease my groaning land shew their intent\n",
" To crush the proud, and right to each man deal,\n",
" To help the Church, and stay the Common-Weal.\n",
" So many obstacles comes in their way\n",
" As puts me to a stand what I should say.\n",
" Old customs, new Prerogatives stood on.\n",
" Had they not held law fast, all had been gone,\n",
" Which by their prudence stood them in such stead\n",
" They took high Strafford lower by the head,\n",
" And to their Laud be �t spoke they held �n th� Tower\n",
" All England�s metropolitan that hour.\n",
" This done, an Act they would have passed fain\n",
" No prelate should his Bishopric retain.\n",
" Here tugg�d they hard indeed, for all men saw\n",
" This must be done by Gospel, not by law.\n",
" Next the Militia they urged sore.\n",
" This was denied, I need not say wherefore.\n",
" The King, displeased, at York himself absents.\n",
" They humbly beg return, shew their intents.\n",
" The writing, printing, posting to and fro,\n",
" Shews all was done; I�ll therefore let it go.\n",
" But now I come to speak of my disaster.\n",
" Contention�s grown �twixt Subjects and their Master,\n",
" They worded it so long they fell to blows,\n",
" That thousands lay on heaps. Here bleeds my woes.\n",
" I that no wars so many years have known\n",
" Am now destroy�d and slaughter�d by mine own.\n",
" But could the field alone this strife decide,\n",
" One battle, two, or three I might abide,\n",
" But these may be beginnings of more woe�\n",
" Who knows, the worst, the best may overthrow!\n",
" Religion, Gospel, here lies at the stake,\n",
" Pray now, dear child, for sacred Zion�s sake,\n",
" Oh, pity me in this sad perturbation,\n",
" My plundered Towns, my houses� devastation,\n",
" My ravisht virgins, and my young men slain,\n",
" My wealthy trading fallen, my dearth of grain.\n",
" The seedtime�s come, but Ploughman hath no hope\n",
" Because he knows not who shall inn his crop.\n",
" The poor they want their pay, their children bread,\n",
" Their woful mothers� tears unpitied.\n",
" If any pity in thy heart remain,\n",
" Or any child-like love thou dost retain,\n",
" For my relief now use thy utmost skill,\n",
" And recompense me good for all my ill.\n",
"\n",
" New England.\n",
"\n",
" Dear mother, cease complaints, and wipe your eyes,\n",
" Shake off your dust, cheer up, and now arise.\n",
" You are my mother, nurse, I once your flesh,\n",
" Your sunken bowels gladly would refresh.\n",
" Your griefs I pity much but should do wrong,\n",
" To weep for that we both have pray�d for long,\n",
" To see these latter days of hop�d-for good,\n",
" That Right may have its right, though �t be with blood.\n",
" After dark Popery the day did clear;\n",
" But now the Sun in�s brightness shall appear.\n",
" Blest be the Nobles of thy Noble Land\n",
" With (ventur�d lives) for truth�s defence that stand.\n",
" Blest be thy Commons, who for Common good\n",
" And thy infringed Laws have boldly stood.\n",
" Blest be thy Counties, who do aid thee still\n",
" With hearts and states to testify their will.\n",
" Blest be thy Preachers, who do cheer thee on.\n",
" Oh, cry: the sword of God and Gideon!\n",
" And shall I not on them wish Mero�s curse\n",
" That help thee not with prayers, arms, and purse?\n",
" And for my self, let miseries abound\n",
" If mindless of thy state I e�er be found.\n",
" These are the days the Church�s foes to crush,\n",
" To root out Prelates, head, tail, branch, and rush.\n",
" Let�s bring Baal�s vestments out, to make a fire,\n",
" Their Mitres, Surplices, and all their tire,\n",
" Copes, Rochets, Croziers, and such trash,\n",
" And let their names consume, but let the flash\n",
" Light Christendom, and all the world to see\n",
" We hate Rome�s Whore, with all her trumpery.\n",
" Go on, brave Essex, shew whose son thou art,\n",
" Not false to King, nor Country in thy heart,\n",
" But those that hurt his people and his Crown,\n",
" By force expel, destroy, and tread them down.\n",
" Let Gaols be fill�d with th� remnant of that pack,\n",
" And sturdy Tyburn loaded till it crack.\n",
" And ye brave Nobles, chase away all fear,\n",
" And to this blessed Cause closely adhere.\n",
" O mother, can you weep and have such Peers?\n",
" When they are gone, then drown your self in tears,\n",
" If now you weep so much, that then no more\n",
" The briny Ocean will o�erflow your shore.\n",
" These, these are they (I trust) with Charles our king,\n",
" Out of all mists such glorious days will bring\n",
" That dazzled eyes, beholding, much shall wonder\n",
" At that thy settled Peace, thy wealth, and splendour,\n",
" Thy Church and Weal establish�d in such manner\n",
" That all shall joy that thou display�dst thy banner,\n",
" And discipline erected so, I trust,\n",
" That nursing Kings shall come and lick thy dust.\n",
" Then Justice shall in all thy Courts take place\n",
" Without respect of persons or of case.\n",
" Then bribes shall cease, and suits shall not stick long,\n",
" Patience and purse of Clients for to wrong.\n",
" Then High Commissions shall fall to decay,\n",
" And Pursuivants and Catchpoles want their pay.\n",
" So shall thy happy Nation ever flourish,\n",
" When truth and righteousness they thus shall nourish.\n",
" When thus in Peace, thine Armies brave send out\n",
" To sack proud Rome, and all her vassals rout.\n",
" There let thy name, thy fame, and valour shine,\n",
" As did thine Ancestors� in Palestine,\n",
" And let her spoils full pay with int�rest be\n",
" Of what unjustly once she poll�d from thee.\n",
" Of all the woes thou canst let her be sped,\n",
" Execute to th� full the vengeance threatened.\n",
" Bring forth the beast that rul�d the world with�s beck,\n",
" And tear his flesh, and set your feet on�s neck,\n",
" And make his filthy den so desolate\n",
" To th� �stonishment of all that knew his state.\n",
" This done, with brandish�d swords to Turkey go,�\n",
" (For then what is it but English blades dare do?)\n",
" And lay her waste, for so�s the sacred doom,\n",
" And do to Gog as thou hast done to Rome.\n",
" Oh Abraham�s seed, lift up your heads on high,\n",
" For sure the day of your redemption�s nigh.\n",
" The scales shall fall from your long blinded eyes,\n",
" And him you shall adore who now despise.\n",
" Then fullness of the Nations in shall flow,\n",
" And Jew and Gentile to one worship go.\n",
" Then follows days of happiness and rest.\n",
" Whose lot doth fall to live therein is blest.\n",
" No Canaanite shall then be found �n th� land,\n",
" And holiness on horses� bells shall stand.\n",
" If this make way thereto, then sigh no more,\n",
" But if at all thou didst not see �t before.\n",
" Farewell, dear mother; Parliament, prevail,\n",
" And in a while you�ll tell another tale.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3. Format your training data for Hugging Face on Amazon SageMaker.\n",
"Now, let's parse your training data to format it for finetuning GPT2 and training on Hugging Face. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"data = []\n",
"\n",
"with open('train.txt') as f:\n",
" for row in f.readlines():\n",
" d = row.strip()\n",
" if len(d) > 1:\n",
" data.append(d)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print ('Found {} valid objects in the training data.'.format(len(data)))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print (data[:10])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sagemaker\n",
"\n",
"sess = sagemaker.Session()\n",
"bucket = sess.default_bucket() \n",
"\n",
"train_file_name = 'train.txt'\n",
"s3_train_data = 's3://{}/gpt2/{}'.format(bucket, train_file_name)\n",
"\n",
"!aws s3 cp {train_file_name} {s3_train_data}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sagemaker\n",
"from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig\n",
"\n",
"# gets role for executing training job\n",
"role = sagemaker.get_execution_role()\n",
"hyperparameters = {\n",
" 'model_name_or_path':\"ismaelfaro/gpt2-poems.en\",\n",
" 'output_dir':'/opt/ml/model',\n",
" 'do_train':True,\n",
" 'train_file': '/opt/ml/input/data/train/{}'.format(train_file_name),\n",
" 'num_train_epochs': 5,\n",
" # set batch size to 22 if using SM training compiler\n",
" \"per_device_train_batch_size\": 64,\n",
" # add your remaining hyperparameters\n",
" # more info here https://github.com/huggingface/transformers/tree/v4.6.1/examples/pytorch/language-modeling\n",
"}\n",
"\n",
"# git configuration to download our fine-tuning script\n",
"git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.6.1'}\n",
"\n",
"# creates Hugging Face estimator\n",
"huggingface_estimator = HuggingFace(\n",
" entry_point='run_clm.py',\n",
" source_dir='./examples/pytorch/language-modeling',\n",
" instance_type='ml.p3.2xlarge',\n",
" instance_count=1,\n",
" role=role,\n",
" git_config=git_config,\n",
" transformers_version='4.11.0',\n",
" pytorch_version='1.9.0',\n",
" py_version='py38',\n",
" hyperparameters = hyperparameters,\n",
" # pass the training compiler config to speed up your job\n",
" compiler_config=TrainingCompilerConfig(),\n",
" environment = {'GPU_NUM_DEVICES':'1'},\n",
" disable_profiler = True,\n",
" debugger_hook_config = False\n",
")\n",
"\n",
"# starting the train job\n",
"# should take about 13 minutes to run on current settings\n",
"huggingface_estimator.fit({'train':s3_train_data}, wait = True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 4. Test your trained model locally"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from sagemaker.huggingface import HuggingFace\n",
"import time\n",
"\n",
"# redefining if you need to restart your kernel \n",
"# huggingface_estimator = HuggingFace.attach('huggingface-pytorch-trcomp-training-2022-03-14-17-11-36-978')\n",
"try:\n",
" s3_model_data = huggingface_estimator.model_data\n",
" local_model_path = 'gpt2_finetuned'\n",
" \n",
"except:\n",
" time.sleep(5)\n",
" s3_model_data = huggingface_estimator.model_data\n",
" local_model_path = 'gpt2_finetuned'\n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!mkdir {local_model_path}\n",
"!aws s3 cp {s3_model_data} {local_model_path}\n",
"!tar -xvf {local_model_path}/model.tar.gz -C {local_model_path}\n",
"!rm {local_model_path}/model.tar.gz"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import AutoTokenizer, AutoModelForCausalLM\n",
"\n",
"# optional - rerun this if you need to restart your kernel. We are actually using the same tokenizer from before\n",
"tokenizer = AutoTokenizer.from_pretrained(\"gpt2\")\n",
"local_model_path = 'gpt2_finetuned'\n",
"model = AutoModelForCausalLM.from_pretrained(local_model_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# step to make sure we can run inference with this model locally\n",
"model.eval()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import set_seed\n",
"\n",
"set_seed(42)\n",
"\n",
"text = \"A rose by any other name \"\n",
"input_ids = tokenizer.encode(text, return_tensors = 'pt')\n",
"\n",
"sample_outputs = model.generate(input_ids,\n",
" do_sample = True, \n",
" max_length = 70,\n",
" num_return_sequences = 5) \n",
" \n",
"bradsteet_raw = get_outputs(sample_outputs, tokenizer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interesting, it certainly looks different. Let's see if we can modify this output using different paramters to invoke the trained model."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sample_outputs = model.generate(input_ids, \n",
" max_length=70,\n",
" do_sample=True, \n",
" # only pick tokens at and above this probability level\n",
" top_p=0.85,\n",
" # only pick from this many tokens\n",
" top_k=200,\n",
" num_return_sequences = 5) \n",
"\n",
"\n",
"bradstreet_top_85 = get_outputs(sample_outputs, tokenizer)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Wow! Quite a difference - not all of these seem very much like Bradstreet, and just much more generic. Yet the logical coherence on some of them is strong. Let's try it again with a smaller top_k and top_p."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"sample_outputs = model.generate(input_ids, \n",
" max_length=70,\n",
" do_sample=True, \n",
" # only pick tokens at and above this probability level\n",
" top_p=0.95,\n",
" # only pick from this many tokens\n",
" top_k=110,\n",
" num_return_sequences = 5) \n",
"\n",
"\n",
"bradstreet_top_95 = get_outputs(sample_outputs, tokenizer)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interesting - under these terms the model seems even more generic. You can still pick up a hint of that very old English style of writing, and yet the social media base terms come even more to the surface."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 5. Load a Text Classifier to Quantify Our Generated Text\n",
"Now, we're going to use another model from the HF Hub. This time it's a text classifier, built specifically to give a strong signal for whether or not our text seems like it's in the style of Anne Bradstreet."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from transformers import AutoTokenizer, AutoModelForSequenceClassification\n",
"\n",
"anne_model_name = 'edubz/anne_bradstreet'\n",
"\n",
"anne_tokenizer = AutoTokenizer.from_pretrained(anne_model_name)\n",
"\n",
"anne_clf = AutoModelForSequenceClassification.from_pretrained(anne_model_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from scipy.special import softmax\n",
"\n",
"def invoke_locally(text, anne_clf, anne_tokenizer):\n",
" \n",
" input_ids = anne_tokenizer(text, return_tensors = 'pt')\n",
"\n",
" output = anne_clf(**input_ids)\n",
"\n",
" logits = output['logits'].detach().numpy().tolist()[0]\n",
"\n",
" res = softmax(logits).tolist()\n",
"\n",
" conf = max(res)\n",
"\n",
" label = res.index(conf)\n",
" \n",
" if label == 0:\n",
" label_str = 'Not Anne'\n",
" elif label == 1:\n",
" label_str = 'Anne'\n",
" \n",
" return {'confidence': conf, 'label':label_str }"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"invoke_locally(\"Alas, dear Mother, fairest Queen and best\", anne_clf, anne_tokenizer)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"invoke_locally(\"A rose by any other name\", anne_clf, anne_tokenizer)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"invoke_locally(\"Wow I am enjoying this workshop\", anne_clf, anne_tokenizer)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, run some tests of your own. Try different invocation parameters. What seems to get you the highest Anne scores?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 6. Deploy your fine-tuned model onto a SageMaker multi-model endpoint\n",
"*Now, if you haven't already, please execute `1_deploy_gpt2_gptj_mme.ipynb`.* \n",
"\n",
"Now, let's deploy this model onto SageMaker. In particular we will save this model to disk, and then load it onto a multi-model endpoint.\n",
"\n",
"\n",
"We'll also list all available models from that endpoint, and test out generating text with each of these. Who knows, maybe we'll stumble on something good enough for the Tribune!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# optional - rerun this if you need to restart your kernel. We are actually using the same tokenizer from before\n",
"\n",
"import sagemaker\n",
"from transformers import AutoTokenizer, AutoModelForCausalLM\n",
"\n",
"sess = sagemaker.Session()\n",
"bucket = sess.default_bucket() \n",
"\n",
"local_model_path = 'gpt2_finetuned'\n",
"\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(\"gpt2\")\n",
"\n",
"model = AutoModelForCausalLM.from_pretrained(local_model_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"bradstreet_path = 'gpt2-bradstreet-model'\n",
"b_model_name = '{}.tar.gz'.format(bradstreet_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"model.save_pretrained('{}/'.format(bradstreet_path))\n",
"tokenizer.save_pretrained('{}/'.format(bradstreet_path))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!tar -czvf {b_model_name} {bradstreet_path}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!aws s3 cp {b_model_name} s3://{bucket}/{prefix}/{b_model_name}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 7. Test your fine-tuned model on SageMaker multi-model endpoint"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"\n",
"client = boto3.client('sagemaker')\n",
"\n",
"endpoints = client.list_endpoints()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for e in endpoints['Endpoints']:\n",
" name = e['EndpointName']\n",
" if 'mme' in name:\n",
" print (name)\n",
" mme_name = name"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sagemaker\n",
"sess=sagemaker.Session()\n",
"\n",
"predictor = sagemaker.predictor.Predictor(endpoint_name = mme_name, sagemaker_session=sess)\n",
"predictor.serializer = sagemaker.serializers.JSONSerializer()\n",
"predictor.deserializer = sagemaker.deserializers.JSONDeserializer()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# the first time you invoke a model on an MME it will take longer to respond b/c the model is being copied from S3\n",
"predictor.predict({\"inputs\":'A rose by any other name'}, target_model=b_model_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"predictor.predict({\"inputs\":'My country, my land, my home'}, target_model=b_model_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 8. Write poetry for the Chicago Tribune\n",
"Now - select your favorite lines from each output from GPT, and pass it in to the model. Feel free to modify the parameters using kwargs. When you are finished, you can submit your poem to our GitHub workshop page!\n",
"\n",
"**Please note** every time you invoke a new model via MME AWS is copying the model artifact from S3 to the SageMaker endpoint. That means **expect a big time delay whenever you invoke a new model.** \n",
"\n",
"One way to get around that is with model compilation, ie running SageMaker Neo to decrease the size, and thereby the runtime, of that model.\n",
"\n",
"In the poem below, I manually copied my favorite line from each output of the model, and fed it in to the generator. I manually pasted all of my favorites into the markdown file you see below.\n",
"\n",
"---"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### My poem - A rose by any other model\n",
"\n",
"A rose by any other name has many meanings.
\n",
"When all that has been presented to us is a form of exaggeration.
\n",
"The language will not preserve.
\n",
"However, the old idea of he who has no business vainly passing by without any other
\n",
"Some unending mizzen, deceived and deceived, seems ever more absurd and likely to harm our time.
\n",
"We tuck his back into the sea which is on the plain almost as soon as we lose sight of him.
\n",
"A mariner shall pass.
\n",
"And I may leave nothing to thee till thou return, for as I said, My hand am strong when thou shouldst require it.
\n",
"This comes out of Kant\\'s conviction that we have nothing in our minds
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text = 'A rose by any other name has many meanings.'\n",
"predictor.predict({\"inputs\":text, \n",
" 'parameters':{'max_length':70,\n",
" 'do_sample':True, \n",
" # only pick tokens at and above this probability level\n",
" 'top_p':0.99,\n",
" # only pick from this many tokens\n",
" 'top_k':600}}, \n",
" target_model=b_model_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text = 'However, the old idea of he who has no business vainly passing by without any other means'\n",
"\n",
"predictor.predict({\"inputs\":text, \n",
" 'parameters':{'max_length':70,\n",
" 'do_sample':True, \n",
" # only pick tokens at and above this probability level\n",
" 'top_p':0.99,\n",
" # only pick from this many tokens\n",
" 'top_k':600}}, \n",
" target_model=b_model_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text = 'If two people try to communicate true love'\n",
"\n",
"predictor.predict({\"inputs\":text, \n",
" 'parameters':{'max_length':70,\n",
" 'do_sample':True, \n",
" # only pick tokens at and above this probability level\n",
" 'top_p':0.99,\n",
" # only pick from this many tokens\n",
" 'top_k':600}}, \n",
" target_model=b_model_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"text = 'A rose by any other model'\n",
"\n",
"predictor.predict({\"inputs\":text, \n",
" 'parameters':{'max_length':70,\n",
" 'do_sample':True, \n",
" # only pick tokens at and above this probability level\n",
" 'top_p':0.99,\n",
" # only pick from this many tokens\n",
" 'top_k':100}}, \n",
" target_model=b_model_name)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Optional - use a pretrained GPTJ"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Use the Endpoints notebook in this repository to deploy and test a GPT-J 6B endpoint. Compare the generation to that of your fine-tuned GPT-2 model. Add some of the lines to your poem if you like!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Optional- Use Other Pretrained Models and Hugging Face Datasets\n",
"\n",
"**Available pretrained models and datasets from Hugging Face**\n",
"\n",
"**Datasets**\n",
"- Shakespeare:\n",
" - https://huggingface.co/datasets/tiny_shakespeare \n",
"\n",
"**Pretrained models**\n",
"- Chinese Poetry:\n",
" - https://huggingface.co/caixin1998/chinese-poetry-gpt2 \n",
"- Hebrew Poetry:\n",
" - https://huggingface.co/Norod78/hebrew_poetry-gpt_neo-small \n",
"- Arabic Poetry:\n",
" - https://huggingface.co/akhooli/gpt2-small-arabic-poetry \n",
"- Russian Poetry:\n",
" - https://huggingface.co/TuhinColumbia/russianpoetrymany \n",
"- Persian Poetry:\n",
" - https://huggingface.co/mitra-mir/BERT-Persian-Poetry \n",
"- Italian Poetry:\n",
" - https://huggingface.co/TuhinColumbia/italianpoetrymany "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conclusion - Use Hugging Face and Text Generation on Amazon SageMaker for Your Organization\n",
"Now that you've learned how to test, finetune, deploy and utilize a text generation model on SageMaker, let's understand how to apply that within your organzation.\n",
"\n",
"First, think to yourself, does my organization already produce a lot of written text? Do we write documentation, scripts, blog posts, documents, answers to questions, customer messaging, etc? Odds are, you do. \n",
"\n",
"Then, ask yourself, where do we already have a large volume of written text I can easily access? That may be your existing public documentation, your existing blog posts, etc. First, run through this notebook and use some of your own data to finetune a GPT model. See how well that performs, then consider scaling to large models, including GPT-J. \n",
"\n",
"If you really aren't seeing the performance you want, [consider training a model from scratch!](https://github.com/nlp-with-transformers/notebooks/blob/main/10_transformers-from-scratch.ipynb )\n",
"\n",
"Look at this and other examples within [Hugging Face's SageMaker example notebooks](https://github.com/huggingface/notebooks/tree/master/sagemaker), and similar examples on the [SageMaker repository!](https://github.com/aws/amazon-sagemaker-examples/search?q=hugging+face)\n",
"\n",
"Remember that in order to get the best performance we **combined a variety of computer-generated and human-discriminated approaches**. Further work could improve on this by training discriminator NLP models, such as text classifiers in certain styles, to make the generation and improvement process even faster."
]
}
],
"metadata": {
"instance_type": "ml.t3.medium",
"kernelspec": {
"display_name": "Python 3 (PyTorch 1.8 Python 3.6 CPU Optimized)",
"language": "python",
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/1.8.1-cpu-py36"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.13"
}
},
"nbformat": 4,
"nbformat_minor": 4
}