U fbg@sddlmZmZmZddlmZmZmZddlm Z ddl m Z m Z ddl ZddlZddlmZddlZddlZddlZddlZddlZdd d ZdddZdddZdddZddZdddZdS))_check_df_load _check_geomget_files_recursively)bbox_corners_to_cocopolygon_to_cocosplit_multi_geometries)_get_logging_level)geojson_to_px_gdfremove_multipolygonsN)tqdm.tifTFc% Cstt}|tt||dt|tr| drx|dt |d"}t |}dd|dD}W5QRXn.t || |d}tt|ttd t|d }nvt|tr|d d|kr|d}n|}d d|D}n8|d t || |d}tt|ttd t|d }|d t || dd}|dt|d koXt|d k}|r@|dtdt|i}td|i}|d|dk r|dj||d<|d|dj||d<nZ|d|ddd|d<|dt|d<|ddd|d<|dt|d<|j|ddd}|dtgggd}t|D]}|d|t|}|dkr| dkrtd |dkr| d!krt |}n| dkrt!|}||d<d"|d<t"j#|d#<|dkr |d$d%|d&<d&}n|}|r|d'|d(t|d)krt$|| |j%|d|kdfj&d)d*}||j%|d|kdfj&d)|d#<nt|d krt|d kr|d+t'd,n^t|d krt|d kr|d-|d(t$|| t|d)d*}t|&d)|d#<|j(|d&id.}|dk r@|d#dd&|d/g}n|d#dd&d/g}tj)||gd0dd!d1}qd|d2|d3t*|d/d#d&||||d4}|d5| dk r*|d6t| d kr|d7d }n|d8d}|d9g}d } | +D]$\}!}"|,|!|"| d:| d 7} q||d;<n|d<d}t-||}#|#|d<|d=| dk rf| |d><|dk rt |d?}$t .||$W5QRX|S)@aSGenerate COCO-formatted labels from one or multiple geojsons and images. This function ingests optionally georegistered polygon labels in geojson format alongside image(s) and generates .json files per the `COCO dataset specification`_ . Some models, like many Mask R-CNN implementations, require labels to be in this format. The function assumes you're providing image file(s) and geojson file(s) to create the dataset. If the number of images and geojsons are both > 1 (e.g. with a SpaceNet dataset), you must provide a regex pattern to extract matching substrings to match images to label files. .. _COCO dataset specification: http://cocodataset.org/ Arguments --------- image_src : :class:`str` or :class:`list` or :class:`dict` Source image(s) to use in the dataset. This can be:: 1. a string path to an image, 2. the path to a directory containing a bunch of images, 3. a list of image paths, 4. a dictionary corresponding to COCO-formatted image records, or 5. a string path to a COCO JSON containing image records. If a directory, the `recursive` flag will be used to determine whether or not to descend into sub-directories. label_src : :class:`str` or :class:`list` Source labels to use in the dataset. This can be a string path to a geojson, the path to a directory containing multiple geojsons, or a list of geojson file paths. If a directory, the `recursive` flag will determine whether or not to descend into sub-directories. output_path : str, optional The path to save the JSON-formatted COCO records to. If not provided, the records will only be returned as a dict, and not saved to file. image_ext : str, optional The string to use to identify images when searching directories. Only has an effect if `image_src` is a directory path. Defaults to ``".tif"``. matching_re : str, optional A regular expression pattern to match filenames between `image_src` and `label_src` if both are directories of multiple files. This has no effect if those arguments do not both correspond to directories or lists of files. Will raise a ``ValueError`` if multiple files are provided for both `image_src` and `label_src` but no `matching_re` is provided. category_attribute : str, optional The name of an attribute in the geojson that specifies which category a given instance corresponds to. If not provided, it's assumed that only one class of object is present in the dataset, which will be termed ``"other"`` in the output json. score_attribute : str, optional The name of an attribute in the geojson that specifies the prediction confidence of a model preset_categories : :class:`list` of :class:`dict`s, optional A pre-set list of categories to use for labels. These categories should be formatted per `the COCO category specification`_. example: [{'id': 1, 'name': 'Fighter Jet', 'supercategory': 'plane'}, {'id': 2, 'name': 'Military Bomber', 'supercategory': 'plane'}, ... ] include_other : bool, optional If set to ``True``, and `preset_categories` is provided, objects that don't fall into the specified categories will not be removed from the dataset. They will instead be passed into a category named ``"other"`` with its own associated category ``id``. If ``False``, objects whose categories don't match a category from `preset_categories` will be dropped. info_dict : dict, optional A dictonary with the following key-value pairs:: - ``"year"``: :class:`int` year of creation - ``"version"``: :class:`str` version of the dataset - ``"description"``: :class:`str` string description of the dataset - ``"contributor"``: :class:`str` who contributed the dataset - ``"url"``: :class:`str` URL where the dataset can be found - ``"date_created"``: :class:`datetime.datetime` when the dataset was created license_dict : dict, optional A dictionary containing the licensing information for the dataset, with the following key-value pairs:: - ``"name": :class:`str` the name of the license. - ``"url": :class:`str` a link to the dataset's license. *Note*: This implementation assumes that all of the data uses one license. If multiple licenses are provided, the image records will not be assigned a license ID. recursive : bool, optional If `image_src` and/or `label_src` are directories, setting this flag to ``True`` will induce solaris to descend into subdirectories to find files. By default, solaris does not traverse the directory tree. explode_all_multipolygons : bool, optional Explode the multipolygons into individual geometries using sol.utils.geo.split_multi_geometries. Be sure to inspect which geometries are multigeometries, each individual geometries within these may represent artifacts rather than true labels. remove_all_multipolygons : bool, optional Filters MultiPolygons and GeometryCollections out of each tile geodataframe. Alternatively you can edit each polygon manually to be a polygon before converting to COCO format. verbose : int, optional Verbose text output. By default, none is provided; if ``True`` or ``1``, information-level outputs are provided; if ``2``, extremely verbose text is output. Returns ------- coco_dataset : dict A dictionary following the `COCO dataset specification`_ . Depending on arguments provided, it may or may not include license and info metadata. z(Preparing image filename: image ID dict.jsonz-COCO json provided. Extracting fname:id dict.rcSsi|]}|d|dqS file_nameid.0imagerrg/home/ec2-user/SageMaker/vegetation-management-remars2022/remars2022-workshop/libs/solaris/data/coco.py sz geojson2coco..Zimages) recursive extensionz3image COCO dict provided. Extracting fname:id dict.cSsi|]}|d|dqSrrrrrrrszaNon-COCO formatted image set provided. Generating image fname:id dict with arbitrary ID integers.zPreparing label filename list.z5Checking if images and vector labels must be matched.zMatching images to label files. image_fname label_fnamez2Getting substrings for matching from image fnames.NZ match_substrz2Getting substrings for matching from label fnames.zLmatching_re is none, getting full filenames without extensions for matching.cSstjtj|ddSNrr ospathsplitextsplitxrrrzgeojson2coco..cSstjtj|ddSrrr$rrrr&r'inner)onhowzLoading labels.)r category_strgeometryz Reading in {}TzUOnly one of remove_all_multipolygons or explode_all_multipolygons can be set to True.Fimage_idzDNo category attribute provided. Creating a default "other" category.otherr+z*do_matches is True, finding matching imagez Converting to pixel coordinates.r ) override_crsZim_pathz2do_matches is False. Many images:1 label detected.z0one label file: many images not implemented yet.z.do_matches is False. 1 image:1 label detected.columnsr,index)axis ignore_indexsortzFinished loading labels.z&Generating COCO-formatted annotations.)geom_col image_id_col category_col score_colpreset_categories include_otherverbosez4Generating COCO-formatted image and license records.zGetting license ID.z supercategoryz(No preset_categories, have category_col.z&Collecting unique category names from .z+Generating category ID numbers arbitrarily.cSsi|]\}}||qSrr)rkvrrrrsz$df_to_coco_annos..z(No category column or preset categories.z,Setting category to "other" for all objects.r9r.r1zChecking geometries.zGetting area of geometries.cSs|jSN)arear$rrrr&r'z"df_to_coco_annos..ryz Getting geometry bounding boxes.cSs t|jSrx)rboundsr$rrrr&r'bbox category_id annotation_idscorec Ss|dkrB|dt||t||t||g|d|dddS|dt||t||t||gt|||d|dddSdS)z5get a single annotation record from a row of temp_df.Nr}ryr{r )rr.r| segmentationryr{iscrowd)rr.r|rr~ryr{r)rFrfloat)rowr7category_id_colr8r:rrr _row_to_cocos$       z&df_to_coco_annos.._row_to_coco)r4r7rr8r:)rcategory_name_colsupercategory_col) annotations categoriesrA)"rBrCrDrErrFrGrcopyr]%_coco_category_name_id_dict_from_listrPrSr@r`isinr^arrayramaxrguniquerOrQrRrcrWrmaptolistcoco_categories_dict_from_dfrKrri)dfrjr7r8r9r:r;rr<Z starting_idr=rlZtemp_df category_dictZcategory_namesZother_idrZcoco_annotationsZcoco_categoriesZ output_dictrrrrrre1s5                             recCsV||g}|d|di}|dk r.||d||<||}|j|d}|}|jddS)aExtract category IDs, category names, and supercat names from df. Arguments --------- df : :class:`pandas.DataFrame` A :class:`pandas.DataFrame` of records to filter for category info. category_id_col : str The name for the column in `df` that contains category IDs. category_name_col : str The name for the column in `df` that contains category names. supercategory_col : str, optional The name for the column in `df` that contains supercategory names, if one exists. If not provided, supercategory will be left out of the output. Returns ------- :class:`list` of :class:`dict` s A :class:`list` of :class:`dict` s that contain category records per the `COCO dataset specification`_ . rr>Nrtr1records)orient)rgrcdrop_duplicatesto_dict)rrrrZ cols_to_keepZ rename_dictZ coco_cat_dfrrrrs  rc Cspg}|D]^\}}t|}|j}|j}W5QRX|tj|d||d}|dk r`||d<||q |S)aTTake a dict of ``image_fname: image_id`` pairs and make a coco dict. Note that this creates a relatively limited version of the standard `COCO image record format`_ record, which only contains the following keys:: * id ``(int)`` * width ``(int)`` * height ``(int)`` * file_name ``(str)`` * license ``(int)``, optional .. _COCO image record format: http://cocodataset.org/#format-data Arguments --------- image_ref : dict A dictionary of ``image_fname: image_id`` key-value pairs. license_id : int, optional The license ID number for the relevant license. If not provided, no license information will be included in the output. Returns ------- coco_images : list A list of COCO-formatted image records ready for export to json. r)rrwidthheightNlicense) rfrasteriorKrrr r!r#rg) rnrqZ image_recordsrr.rmrrZ im_recordrrrrhs  rhcCsdd|D}|S)z5Extract ``{category_name: category_id}`` from a list.cSsi|]}|d|dqS)r>rr)rcategoryrrrr"sz9_coco_category_name_id_dict_from_list..r)Z category_listrrrrrsrcCs`t|tr|St|trNtj|r2t|||dStj|rD|gStdntd |dS)zCGet a list of filenames from p, which can be a dir, fname, or list.)Ztraverse_subdirsrz1If a string is provided, it must be a valid path.z{} is not a string or list.N) rHrPrIr r!isdirrisfiler]rZ)prrrrrrM's     rM)Nr NNNNTNNFFFFr ) Nr,NNNNNTrr )N)N)Fr )Z utils.corerrrZ utils.georrrZ utils.logrZvector.polygonr r numpyr^rr rr pandasrT geopandasr[rBrsrerrhrrMrrrrsV   $  $ -