B ª`Kã@s dZdZddlZddlmZddlmZddlZddlmZm Z ddl m Z ddl Z ddl Z ddlZddlZddlZddlZddlZddlZdd „Zd#d d „ZGd d„deƒZdd„ZdZdZd$dd„Zd%dd„Zd&dd„Zd'dd„Zd(d d!„Zed"kreej  ¡ƒdS))z=Diagnostic functions, mainly for use when doing tech support.ÚMITéN)ÚStringIO)Ú HTMLParser)Ú BeautifulSoupÚ __version__)Úbuilder_registryc CsVtdtƒtdtjƒdddg}x>|D]6}x0tjD]}||jkr6Pq6W| |¡td|ƒq*Wd|krÌ| d¡y*dd l m }td d   t t |jƒ¡ƒWn*tk rÊ}z td ƒWd d }~XYnXd|krydd l}td|jƒWn,tk r}z tdƒWd d }~XYnXt|dƒr4| ¡}nŠ| d¡sL| d¡rdtd|ƒtdƒd Sy:tj |¡rœtd|ƒt|ƒ}| ¡}Wd QRXWntk r´YnXtdƒx’|D]Š}td|ƒd} yt||d} d} Wn8tk r$}ztd|ƒt ¡Wd d }~XYnX| rDtd|ƒt|  ¡ƒtdƒqÄWd S)z¼Diagnostic suite for isolating common problems. :param data: A string containing markup that needs to be explained. :return: None; diagnostics are printed to standard output. z'Diagnostic running on Beautiful Soup %szPython version %sz html.parserÚhtml5libÚlxmlz;I noticed that %s is not installed. Installing it may help.zlxml-xmlr)ÚetreezFound lxml version %sÚ.z.lxml is not installed or couldn't be imported.NzFound html5lib version %sz2html5lib is not installed or couldn't be imported.Úreadzhttp:zhttps:z<"%s" looks like a URL. Beautiful Soup is not an HTTP client.zpYou need to use some other library to get the document behind the URL, and feed that document to Beautiful Soup.z7"%s" looks like a filename. Reading data from the file.Úz#Trying to parse your markup with %sF)ÚfeaturesTz%s could not parse the markup.z#Here's what %s did with the markup:zP--------------------------------------------------------------------------------)ÚprintrÚsysÚversionrZbuildersrÚremoveÚappendr r ÚjoinÚmapÚstrZ LXML_VERSIONÚ ImportErrorrÚhasattrr Ú startswithÚosÚpathÚexistsÚopenÚ ValueErrorrÚ ExceptionÚ tracebackÚ print_excZprettify) ÚdataZ basic_parsersÚnameÚbuilderr ÚerÚfpÚparserÚsuccessÚsoup©r*úl/private/var/folders/fw/jsxvvqfs4sz4tdnfdvg5typ5vk77qg/T/pip-install-p7nfy4dm/beautifulsoup4/bs4/diagnose.pyÚdiagnosesj                     r,TcKsNddlm}x<|jt|ƒfd|i|—ŽD]\}}td||j|jfƒq(WdS)a´Print out the lxml events that occur during parsing. This lets you see how lxml parses a document when no Beautiful Soup code is running. You can use this to determine whether an lxml-specific problem is in Beautiful Soup's lxml tree builders or in lxml itself. :param data: Some markup. :param html: If True, markup will be parsed with lxml's HTML parser. if False, lxml's XML parser will be used. r)r Úhtmlz %s, %4s, %sN)r r Ú iterparserrÚtagÚtext)r"r-Úkwargsr ÚeventÚelementr*r*r+Ú lxml_trace]s $r4c@s`eZdZdZdd„Zdd„Zdd„Zdd „Zd d „Zd d „Z dd„Z dd„Z dd„Z dd„Z dS)ÚAnnouncingParserzèSubclass of HTMLParser that announces parse events, without doing anything else. You can use this to get a picture of how html.parser sees a given document. The easiest way to do this is to call `htmlparser_trace`. cCs t|ƒdS)N)r)ÚselfÚsr*r*r+Ú_puszAnnouncingParser._pcCs| d|¡dS)Nz%s START)r8)r6r#Úattrsr*r*r+Úhandle_starttagxsz AnnouncingParser.handle_starttagcCs| d|¡dS)Nz%s END)r8)r6r#r*r*r+Ú handle_endtag{szAnnouncingParser.handle_endtagcCs| d|¡dS)Nz%s DATA)r8)r6r"r*r*r+Ú handle_data~szAnnouncingParser.handle_datacCs| d|¡dS)Nz %s CHARREF)r8)r6r#r*r*r+Úhandle_charrefszAnnouncingParser.handle_charrefcCs| d|¡dS)Nz %s ENTITYREF)r8)r6r#r*r*r+Úhandle_entityref„sz!AnnouncingParser.handle_entityrefcCs| d|¡dS)Nz %s COMMENT)r8)r6r"r*r*r+Úhandle_comment‡szAnnouncingParser.handle_commentcCs| d|¡dS)Nz%s DECL)r8)r6r"r*r*r+Ú handle_declŠszAnnouncingParser.handle_declcCs| d|¡dS)Nz%s UNKNOWN-DECL)r8)r6r"r*r*r+Ú unknown_declszAnnouncingParser.unknown_declcCs| d|¡dS)Nz%s PI)r8)r6r"r*r*r+Ú handle_piszAnnouncingParser.handle_piN)Ú__name__Ú __module__Ú __qualname__Ú__doc__r8r:r;r<r=r>r?r@rArBr*r*r*r+r5msr5cCstƒ}| |¡dS)zÂPrint out the HTMLParser events that occur during parsing. This lets you see how HTMLParser parses a document when no Beautiful Soup code is running. :param data: Some markup. N)r5Úfeed)r"r'r*r*r+Úhtmlparser_trace“srHZaeiouZbcdfghjklmnpqrstvwxyzécCs>d}x4t|ƒD](}|ddkr$t}nt}|t |¡7}qW|S)z#Generate a random word-like string.r ér)ÚrangeÚ _consonantsÚ_vowelsÚrandomÚchoice)Úlengthr7ÚiÚtr*r*r+Úrword¡s rSécCsd dd„t|ƒDƒ¡S)z'Generate a random sentence-like string.ú css|]}tt dd¡ƒVqdS)rTé N)rSrNÚrandint)Ú.0rQr*r*r+ú ®szrsentence..)rrK)rPr*r*r+Ú rsentence¬srZéècCs¨dddddddg}g}x~t|ƒD]r}t dd ¡}|dkrRt |¡}| d |¡q |d krr| tt d d ¡ƒ¡q |d kr t |¡}| d|¡q Wdd |¡dS)z+Randomly generate an invalid HTML document.ÚpÚdivÚspanrQÚbÚscriptÚtableréz<%s>érTrJzzÚ z)rKrNrWrOrrZr)Ú num_elementsZ tag_namesÚelementsrQrOZtag_namer*r*r+Úrdoc°s   rgé †c Cs(tdtƒt|ƒ}tdt|ƒƒxŽdddgddgD]z}d}y"t ¡}t||ƒ}t ¡}d}Wn6tk r–}ztd |ƒt ¡Wd d }~XYnX|r6td |||fƒq6Wd d l m }t ¡}|  |¡t ¡}td||ƒd d l } |   ¡}t ¡}| |¡t ¡}td||ƒd S)z.Very basic head-to-head performance benchmark.z1Comparative parser benchmark on Beautiful Soup %sz3Generated a large invalid HTML document (%d bytes).r r-rz html.parserFTz%s could not parse the markup.Nz"BS4+%s parsed the markup in %.2fs.r)r z$Raw lxml parsed the markup in %.2fs.z(Raw html5lib parsed the markup in %.2fs.)rrrgÚlenÚtimerrr r!r r ZHTMLrrÚparse) rer"r'r(Úar)r_r%r rr*r*r+Úbenchmark_parsersÂs4      rmr cCsXt ¡}|j}t|ƒ}tt||d}t d|||¡t  |¡}|  d¡|  dd¡dS)z7Use Python's profiler on a randomly generated document.)Úbs4r"r'zbs4.BeautifulSoup(data, parser)Z cumulativez _html5lib|bs4é2N) ÚtempfileÚNamedTemporaryFiler#rgÚdictrnÚcProfileZrunctxÚpstatsZStatsZ sort_statsZ print_stats)rer'Z filehandleÚfilenamer"ÚvarsÚstatsr*r*r+Úprofileâs  rxÚ__main__)T)rI)rT)r[)rh)rhr )!rFÚ __license__rsÚiorÚ html.parserrrnrrZ bs4.builderrrrtrNrprjr rr,r4r5rHrMrLrSrZrgrmrxrCÚstdinr r*r*r*r+Ús8   G &