U fb @s>ddlZddlZddlmZddlmZddlZd d d ZdS) N)_get_logging_level)get_files_recursivelytifjson dataset.csvtrainFc Cs tt} | tt| | d|dkr>|dkr>td| d| dt |||d} | dt | d t d | i} |dkr| d | d t |||d} | dt | d t | t | kr| d| d| dt d| i}| d dd| d<|ddd|d<|r| d| dj|}|dj|}t |jdks~t |jdkrtdn|| d<||d<n6| d| ddd| d<|ddd|d<| dt j| |ddd }| d!t |t |t | kr&|dkr&td"nt |t | krL|dkrLtd#nft |t | kr|d$kr| d%t | |jdd&d'}|j|ddf}| d(t || d)|d dgjd*d+d,d-}n$|dkr| d.| jd d*id-}| d/|d0|j|d&d1|S)2a Automatically generate dataset CSVs for training. This function creates basic CSVs for training and inference automatically. See `the documentation tutorials `_ for details on the specification. A regular expression string can be provided to extract substrings for matching images to labels; if not provided, it's assumed that the filename for the image and label files is identical once extensions are stripped. By default, this function will raise an exception if there are multiple label files that match to a given image file, or if no label file matches an image file; see the `ignore_mismatch` argument for alternatives. Arguments --------- im_dir : str The path to the directory containing images to be used by your model. Images in sub-directories can be included by setting ``recursive=True``. im_ext : str, optional The file extension used by your images. Defaults to ``"tif"``. Not case sensitive. label_dir : str, optional The path to the directory containing images to be used by your model. Images in sub-directories can be included by setting ``recursive=True``. This argument is required if `stage` is ``"train"`` (default) or ``"val"``, but has no effect if `stage` is ``"infer"``. output_path : str, optional The path to save the generated CSV to. Defaults to ``"dataset.csv"``. stage : str, optional The stage that the csv is generated for. Can be ``"train"`` (default), ``"val"``, or ``"infer"``. If set to ``"train"`` or ``"val"``, `label_dir` must be provided or an error will occur. match_re : str, optional A regular expression pattern to extract substrings from image and label filenames for matching. If not provided and labels must be matched to images, it's assumed that image and label filenames are identical after stripping directory and extension. Has no effect if ``stage="infer"``. The pattern must contain at least one capture group for compatibility with :func:`pandas.Series.str.extract`. recursive : bool, optional Should sub-directories in `im_dir` and `label_dir` be traversed to find images and label files? Defaults to no (``False``). ignore_mismatch : str, optional Dictates how mismatches between image files and label files should be handled. By default, having != 1 label file per image file will raise a ``ValueError``. If ``ignore_mismatch="skip"``, any image files with != 1 matching label will be skipped. verbose : int, optional Verbose text output. By default, none is provided; if ``True`` or ``1``, information-level outputs are provided; if ``2``, extremely verbose text is output. Returns ------- output_df : :class:`pandas.DataFrame` A :class:`pandas.DataFrame` with one column titled ``"image"`` and a second titled ``"label"`` (if ``stage != "infer"``). The function also saves a CSV at `output_path`. zChecking arguments.inferNz1label_dir must be provided if stage is not infer.zMatching images to labels.zGetting image file paths.)traverse_subdirs extensionzGot z image file paths. image_pathz%Preparing training or validation set.zGetting label file paths.z label file paths.z2The number of images and label files is not equal.z$Matching image files to label files.z2Extracting image filename substrings for matching. label_pathcSstj|dSNrospathsplitxrh/home/ec2-user/SageMaker/vegetation-management-remars2022/remars2022-workshop/libs/solaris/utils/data.pyaz"make_dataset_csv.. image_fnamecSstj|dSrrrrrrrcr label_fnamez*match_re is True, extracting regex matchesrz                        rA) rNrrrNFNr) rpandasr1logrcorerr(rArrrrs