o c\@sxdZddlZddlZddlZddlmZddlmZdZdZ GdddZ Gd d d Z Gd d d Z Gd ddZ dS)a This python function is part of the main processing workflow. It contains the data structures and functions required to hold the results of a post-processing run, as well as being responsible for generating the output JSON that is stored in S3. - PCAResults - this is the main parent for the constructs, and is responsible for writing out the results - ConversationAnalytics - holds all of the header-level call and analytical data for the call - TranscribeJobInfo - holds information about the underlying Transcribe job - SpeechSegment - single instance of a speech segment, and PCAResults holds an array of these for the call The output JSON is split into the following high-level structure. +--ConversationalAnalytics | | | +--TranscribeJobInfo | +--SpeechSegment[] Please refer the output_json_structure.md file for full details on the output schema. Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. SPDX-License-Identifier: Apache-2.0 N)datetime)Pathz/tmp/ZinterimResultsc@seZdZdZddZdS) SpeechSegmentz9 Class to hold information about a single speech segment cCs|d|_d|_d|_d|_g|_d|_d|_d|_d|_d|_ g|_ g|_ g|_ d|_ g|_g|_g|_g|_g|_d|_dS)NF)segmentStartTimesegmentEndTimesegmentSpeaker segmentTextsegmentConfidencesegmentSentimentScoreZsegmentPositiveZsegmentNegativesegmentIsPositivesegmentIsNegativesegmentAllSentimentssegmentCustomEntitiessegmentLoudnessScoressegmentInterruptionsegmentIssuesDetectedsegmentActionItemsDetectedsegmentOutcomesDetectedsegmentCategoriesDetectedPresegmentCategoriesDetectedPost segmentIVRselfrd/Users/cmlott/Code/transcribe/amazon-transcribe-post-call-analytics/pca-server/src/pca/pcaresults.py__init__$s( zSpeechSegment.__init__N)__name__ __module__ __qualname____doc__rrrrrr"s rc@s8eZdZdZddZddZddZdd Zd d Zd S) ConversationAnalyticszC Class to hold the header-level analytics information about a call cCsd|_d|_d|_g|_d|_d|_d|_tt |_ d|_ d|_ i|_ g|_g|_i|_g|_d|_g|_g|_g|_d|_t|_dS)Nrr)conversationLanguageCodeguidagent agent_listcustconversationTimeconversationLocationstrrnowprocessingTimeentity_recognizerdurationsentiment_trendsspeaker_labelscustom_entities speaker_timecategories_detectedcombined_graphic_urlissues_detectedactions_detectedoutcomes_detected telephonyTranscribeJobInfotranscribe_jobrrrrr?s* zConversationAnalytics.__init__cC|jS)zQ Returns a reference to the Transcribe job information structure N)r:rrrrget_transcribe_jobVz(ConversationAnalytics.get_transcribe_jobcCs|j|j|j|j|j|j|j|jt|j |j |j |j |j d }|jdkr*|d|d<|j|d<|jjtjkrO|j|d<|j|d<|j|d<|j|d <|j|d <|jd urY|j|d <d |ji}|g|d<|S)aG Generates output JSON for the [ConversationAnalytics] section of the output results document, which includes information about the call, speaker labels, sentiment trends and entities. It also includes the orchestration of the [TranscribeJobInfo] block, as that's included in this one's schema ) GUIDAgentAgentsCustConversationTimeConversationLocation ProcessTime LanguageCodeDuration SpeakerLabelsCustomEntitiesEntityRecognizerNameSentimentTrendsrrDrB SpeakerTimeCategoriesDetectedIssuesDetectedActionItemsDetectedOutcomesDetectedCombinedAnalyticsGraphN Telephonyr9SourceInformation)r$r%r&r'r(r)r,r#r*r.r0r1r-r/r2r:api_modecf API_ANALYTICSr3r5r6r7r4r8create_json_output)rZconv_header_infotranscribe_job_inforrrrV\s8           z(ConversationAnalytics.create_json_outputcCs|d|_|d|_|d|_|d|_|d|_|d|_|d|_t|d|_|d |_ |d |_ |d |_ |d |_ |d |_ d|vrL|d|_d|vrU|d|_d|vrr|d|_|d|_|d|_|d|_|d|_|j|ddddS)z Creates the internal data structures required for the Conversation Analytics data from the supplied JSON fragment. :param json_input: "ConversationAnalytics" block from a PCA results file r>r?rArBrCrDrErFrGrHrIrJrKr@rQrLrMrNrOrPrRrr9N)r$r%r'r(r)r,r#floatr.r0r1r-r/r2r&r8r3r5r6r7r4r:parse_json_input)r json_inputrrrrYs0                   z&ConversationAnalytics.parse_json_inputc Cs(i}g}|dD]R}|t|d|dd}g}|d|dD]/}t|ddt|ddd} || | d |vrG|g|| d <q!|| d |q!||d <||qt|d kr|D]} |D]} | | jkr| j|| 7_|| qkqc|D] } |d j|| 7_q|S)a This will extract and return the header information for detected categories, but it will also inject markers into the SpeechSegments to indicate on which line of the transcript a particular category should be highlighted in a UI @param categories: "Categories" block from the Call Analytics results @param speech_segments: Current speech segment list that this function needs to update @return: JSON structure for header-level "CategoriesDetected" block ZMatchedCategoriesZMatchedDetailsZPointsOfInterest)Name InstancesZBeginOffsetMillisiZEndOffsetMillis)BeginOffsetSecsZ EndOffsetSecsr]Z TimestampsrN) lenrXappendcopykeysrrpopr) r categoriesspeech_segmentsZtimed_categoriesr3Z matched_catZ next_categoryZtimestamp_arrayinstanceZ next_poi_timesegmentZcat_timecategoryrrrextract_analytics_categoriess8        z2ConversationAnalytics.extract_analytics_categoriesN) rrr r!rr<rVrYrirrrrr"=s/ 'r"c@s(eZdZdZddZddZddZdS) r9zB Class to hold the information about an underlying Transcribe job cCs`tj|_d|_d|_d|_d|_d|_d|_d|_ d|_ d|_ d|_ d|_ d|_d|_d|_dS)Nri@rF)rTrUrSstreaming_sessioncompletion_time media_formatmedia_sample_ratemedia_original_urimedia_playback_uricummulative_word_confclm_namecustom_vocab_namevocab_filter_namevocab_filter_methodtranscribe_job_namechannel_identificationredacted_transcriptrrrrrs zTranscribeJobInfo.__init__c Cs|j|j|j|j|j|j|j|j|j|j d }|j dur!|j |d<|j dkr+|j |d<|j dkr5|j |d<|j dkrF|j d|jd|d <|S) z Creates the information about the underlying Transcribe job @return: JSON structure representing the original Transcribe job ) TranscribeApiTypeCompletionTime MediaFormatMediaSampleRateHertzMediaOriginalUriAverageWordConfidence MediaFileUriTranscriptionJobNameRedactedTranscriptChannelIdentificationNStreamingSessionrVocabularyNameCLMNamez []VocabularyFilter)rSrlrmrnrorqrprvrxrwrkrsrrrtru)rrWrrrrVs(      z$TranscribeJobInfo.create_json_outputcCs|d|_|d|_|d|_|d|_|d|_|d|_t|d|_|d|_t |d |_ d |vr:|d |_ d |vrC|d |_ d |vr`|d }| d d|_| dd dd|_d|vrkt|d|_d|vrxt|d|_dSdS)z Creates the internal data structures required for the TranscribeJobInfo data from the supplied JSON fragment. :param json_input: "TranscribeJobInfo" block from a PCA results file ryrzr{r|r}rr~rrrrr r[r^rrrN)rSrlrmrnrorprXrqrvintrwrsrrsplitrtruboolrxrk)rrZZ filter_stringrrrrY"s,         z"TranscribeJobInfo.parse_json_inputN)rrr r!rrVrYrrrrr9s  %r9c@sTeZdZdZdZdZddZddZdd Zd d Z dddZ ddZ dddZ d S) PCAResultszW Class to hold the full structure of the PCA Results, along with reader/writer methods spk_ZUnknown_cCsg|_t|_dS)N)rer" analyticsrrrrrJs zPCAResults.__init__cCs|r|jS|jS)a Returns the pre-defined speaker prefix, which is used based upon whether the caller is dealing with a known or unknown speaker :param known_speaker: Flag set to indicate that we want the prefix for a known caller :return: Speaker prefix text N)KNOWN_SPEAKER_PREFIXUNKNOWN_SPEAKER_PREFIX)rZ known_speakerrrrget_speaker_prefixNszPCAResults.get_speaker_prefixcCr;)z[ Returns a reference to the Conversational Analytics information structure N)rrrrrget_conv_analytics[r=zPCAResults.get_conv_analyticscCsg}|jD]Y}id|jd|jd|jd|jd|jd|jd|jdd d |jd t|j d t|j d |j d|j d|j d|jd|jd|j|j|j|jd}||q|S)zI Creates a list of speech segments for this conversation SegmentStartTimeSegmentEndTimeSegmentSpeakerSegmentInterruption IVRSegment OriginalText DisplayTextZ TextEditedrLoudnessScoresSentimentIsPositiveSentimentIsNegativeSentimentScoreBaseSentimentScoresEntitiesDetectedrLFollowOnCategoriesrM)rNrOWordConfidenceN)rerrr rrr rrr rr rrrrrrrr r`)rrerg next_segmentrrrcreate_output_speech_segmentsasR        z(PCAResults.create_output_speech_segmentsNFc Cst|rtjtj}td|}n|}|}|j|d}td}| ||}|j t t |dd||fS)a~ Writes out the PCA result data to the specified bucket/key location. :param bucket: Bucket where the results are to be uploaded to :param object_key: Name of the output file for the results :param interim: Forcibly writes the key to our interim results folder :return: JSON results object :return: Destination S3 object key /)r"SpeechSegmentss3zUTF-8)BodyN)rT appConfigCONF_S3BUCKET_OUTPUTINTERIM_RESULTS_KEYrrVrboto3resourceZObjectputbytesjsondumpsencode) r object_keybucketZinterimZ dest_bucketZdest_key json_data s3_resourceZ s3_objectrrrwrite_results_to_s3s   zPCAResults.write_results_to_s3cCsi}|jjD]}g||d<qg|j_|jD](}|jr>|jD]}|d}|d}||vr0g||<|||vr=|||qq|D]}t||dkr^|t||||d}|jj|qAdS)a Some telephony post-processing can erase segment-level entities, such as all of those assigned to an IVR speech segment. This method will assume that the speech segments are correct and will re-build the header-level entities appropriately. r[TypeTextr)r[r\ZValuesN)rr1rerr`r_)rZheader_ent_dictZ entity_typergentityZ entity_textZ nextEntityrrrregenerate_header_entitiess.     z%PCAResults.regenerate_header_entitiesc Cs`t|dd}|std}||||t|}tt| ddd}|j |dg|_ |dD]w}t } t|d | _t|d | _|d | _t|d | _|d | _|d| _t|d| _t|d| _t|d| _|d| _|d| _|d| _|d| _|d| _|d| _|d| _|d| _ d|vrt|d| _!|j "| q6dS)Nrr^rrzutf-8)encodingr"rrrrrrrrrrrrrLrrMrNrOrr)#TMP_DIRrrclient download_filerrloadopenabsoluterrYrerrXrrr rrr rr rr rrrrrrrr rr`) rrrofflineZlocal_filename s3_client json_filepathrrZ new_segmentrrrread_results_from_s3s>             zPCAResults.read_results_from_s3)NNF)F) rrr r!rrrrrrrrrrrrrrCs  $!%r)r!rrpcaconfigurationrTrpathlibrrrrr"r9rrrrrs  .Y