API documentation

Loading datasets

econuy.load.load_dataset(name: str, data_dir: str | Path | None = None, skip_cache: bool = False, force_overwrite: bool = False, skip_update: bool = False) Dataset[source]

Load a dataset by name, optionally skipping cache and forcing overwrite.

Parameters:
  • name (str) – The name of the dataset to load.

  • data_dir (Union[str, Path, None], optional) – The directory where the dataset is stored or will be stored. If None, the default data directory is used. Default is None.

  • skip_cache (bool, optional) – If True, the cache will be skipped and a new dataset will be retrieved. Default is False.

  • force_overwrite (bool, optional) – If True, the existing dataset will be overwritten. Default is False.

  • skip_update (bool, optional) – If True, the dataset will not be updated if it already exists. Default is False.

Returns:

The loaded dataset.

Return type:

Dataset

Raises:
  • ValueError – If the dataset name is not available in the registry.

  • AssertionError – If the existing dataset has changed and force_overwrite is False.

econuy.load.load_datasets_parallel(names: List[str], data_dir: str | Path | None = None, skip_cache: bool = False, force_overwrite: bool = False, skip_update: bool = False, max_workers: int | None = None, executor_type: Literal['thread', 'process'] = 'thread') Dict[str, Dataset][source]

Load multiple datasets in parallel using either threading or multiprocessing.

Parameters:
  • names (List[str]) – List of dataset names to load.

  • data_dir (Union[str, Path, None], optional) – Directory where datasets are stored. If None, a default directory is used.

  • skip_cache (bool, optional) – If True, skip loading from cache. Default is False.

  • force_overwrite (bool, optional) – If True, force overwrite existing datasets. Default is False.

  • skip_update (bool, optional) – If True, skip updating datasets that already exist. Default is False.

  • max_workers (Optional[int], optional) – Maximum number of workers to use for parallel loading. If None, it will use the default number of workers.

  • executor_type (Literal["thread", "process"], optional) – Type of executor to use for parallel loading. Can be “thread” for ThreadPoolExecutor or “process” for ProcessPoolExecutor. Default is “thread”.

Returns:

A dictionary where keys are dataset names and values are the loaded datasets.

Return type:

Dict[str, Dataset]

Raises:

Exception – If there is an error loading any of the datasets, it will be printed and the dataset will be skipped.

Working with datasets

class econuy.base.Dataset(name: str, data: DataFrame, metadata: DatasetMetadata, transformed: bool = False)[source]

Bases: object

A class to represent a collection of economic data.

Parameters:
  • data (pd.DataFrame) – The economic data.

  • metadata (Metadata) – The metadata of the data.

  • name (str) – The name of the dataset.

Return type:

None

See also

pd.DataFrame, Metadata

validate() None[source]

Validate the dataset.

Raises:

AssertionError – If the number of indicators does not match the number of columns in the data. If any of the indicators are not in the data. If the index of the data is not a DatetimeIndex. If the data contains non-numeric values.

to_detailed(language: str = 'es') DataFrame[source]

Rename the data using the metadata.

Parameters:

language (str, default "es") – The language to use for the metadata.

Returns:

The data with the indicators renamed.

Return type:

pd.DataFrame

to_named(language: str = 'es') DataFrame[source]

Rename the data using the metadata.

Parameters:

language (str, default "es") – The language to use for the metadata.

Returns:

The data with the indicators renamed.

Return type:

pd.DataFrame

to_json() dict[source]

Convert the dataset to a valid JSON dictionary.

Returns:

A JSON representation of the dataset.

Return type:

dict

save(data_dir: str | Path | None = None, name: str | None = None) None[source]

Save the dataset to a directory.

Parameters:
  • data_dir (str or Path) – The directory to save the dataset to.

  • name (str, default None) – The name to save the dataset as without suffixes.

Return type:

None

infer_frequency() Timedelta | None[source]

Infer the frequency of the data.

Returns:

The inferred frequency of the data.

Return type:

Optional[pd.Timedelta]

call_pandas_method(method: str, *args, **kwargs) Dataset[source]
select(ids: str | List[str] | None = None, names: str | List[str] | None = None, language: str = 'es') Dataset[source]
filter(start_date: str | datetime | None = None, end_date: str | datetime | None = None) Dataset[source]
resample(rule: DateOffset | Timedelta | str, operation: Literal['sum', 'mean', 'last', 'upsample'] = 'sum', interpolation: str = 'linear') Dataset[source]

Wrapper for the resample method in Pandas that integrates with econuy dataframes’ metadata.

Trim partial bins, i.e. do not calculate the resampled period if it is not complete, unless the input dataframe has no defined frequency, in which case no trimming is done.

Parameters:
  • rule (pd.DateOffset, pd.Timedelta or str) – Target frequency to resample to. See Pandas offset aliases

  • operation ({'sum', 'mean', 'last', 'upsample'}) – Operation to use for resampling.

  • interpolation (str, default 'linear') – Method to use when missing data are produced as a result of resampling, for example when upsampling to a higher frequency. See Pandas interpolation methods

Return type:

Dataset

Raises:
  • ValueError – If operation is not one of available options.

  • ValueError – If the input dataframe’s columns do not have the appropiate levels.

Warns:

UserWarning – If input frequencies cannot be assigned a numeric value, preventing incomplete bin trimming.

rolling(window: int, operation: Literal['sum', 'mean'] = 'sum') Dataset[source]

Wrapper for the rolling method in Pandas that integrates with econuy dataframes’ metadata.

If periods is None, try to infer the frequency and set periods according to the following logic: {'YE-DEC': 1, 'QE-DEC': 4, 'ME': 12}, that is, each period will be calculated as the sum or mean of the last year.

Parameters:
  • window (int, default None) – How many periods the window should cover.

  • operation ({'sum', 'mean'}) – Operation used to calculate rolling windows.

Return type:

Dataset

Raises:
  • ValueError – If operation is not one of available options.

  • ValueError – If the input dataframe’s columns do not have the appropiate levels.

Warns:

UserWarning – If the input dataframe is a stock time series, for which rolling operations are not recommended.

chg_diff(operation: Literal['chg', 'diff'] = 'chg', period: Literal['last', 'inter', 'annual'] = 'last') Dataset[source]

Wrapper for the pct_change and diff Pandas methods.

Calculate percentage change or difference for dataframes. The period argument takes into account the frequency of the dataframe, i.e., inter (for interannual) will calculate pct change/differences with periods=4 for quarterly frequency, but periods=12 for monthly frequency.

Parameters:
  • operation ({'chg', 'diff'}) – chg for percent change or diff for differences.

  • period ({'last', 'inter', 'annual'}) – Period with which to calculate change or difference. last for previous period (last month for monthly data), inter for same period last year, annual for same period last year but taking annual sums.

Return type:

Dataset

Raises:
  • ValueError – If the dataframe is not of frequency ME (month), QE or QE-DEC (quarter), or YE or YE-DEC (year).

  • ValueError – If the operation parameter does not have a valid argument.

  • ValueError – If the period parameter does not have a valid argument.

  • ValueError – If the input dataframe’s columns do not have the appropiate levels.

rebase(start_date: str | datetime, end_date: str | datetime | None = None, base: float = 100.0) Dataset[source]

Rebase dataset to a date or range of dates.

Parameters:
  • start_date (string or datetime.datetime) – Date to which series will be rebased.

  • end_date (string or datetime.datetime, default None) – If specified, series will be rebased to the average between start_date and end_date.

  • base (float, default 100) – Float for which start_date == base or average between start_date and end_date == base.

Return type:

Dataset

convert(flavor: Literal['usd', 'real', 'gdp'], start_date: str | datetime | None = None, end_date: str | datetime | None = None, error_handling: Literal['raise', 'coerce', 'ignore'] = 'raise') Dataset[source]

Convert dataset from UYU to USD, from UYU to real UYU or from UYU/USD to % GDP.

flavor=usd: Convert a dataset from Uruguayan pesos to US dollars. Takes into account whether the input datasets is flow or stock, in order to choose end of period or monthly average NXR. Also take into account the input dataframe’s frequency and whether columns represent rolling averages or sums.

flavor=real: Convert a dataset columns to real prices. Takes into account the input datasets’s frequency and whether columns represent rolling averages or sums. Allow choosing a single period, a range of dates or no period as a base (i.e., period for which the average/sum of input dataframe and output dataframe is the same).

flavor=gdp: Convert a dataset to percentage of GDP. Takes into account the input dataset’s currency for chossing UYU or USD GDP. If frequency of input dataset is higher than quarterly, GDP will be upsampled and linear interpolation will be performed to complete missing data. If input dataframe’s “cumulative_periods” level is not 12 for monthly frequency or 4 for quarterly frequency, calculate rolling input dataframe.

In all cases, if input dataframe’s frequency is higher than monthly (daily, business, etc.), resample to monthly frequency.

Parameters:
  • flavor (str) – usd for USD, real for real UYU, gdp for % GDP.

  • start_date (str, datetime.date or None, default None) – Only used if flavor=real. If set to a date-like string or a date, and end_date is None, the base period will be start_date.

  • end_date (str, datetime.date or None, default None) – Only used if flavor=real. If start_date is set, calculate so that the data is in constant prices of start_date-end_date.

  • error_handling ({"raise", "coerce", "ignore"}, default "raise") – What to do when the input dataset can’t be converted. Coercion will set to np.nan, while “ignore” is a no-op. If “raise”, will raise an error.

Return type:

Dataset

decompose(method: Literal['x13', 'loess', 'mloess', 'moving_averages'] = 'x13', fallback: Literal['loess', 'mloess', 'moving_averages'] = 'loess', component: Literal['t-c', 'sa'] = 'sa', fn_kwargs: dict | None = None, ignore_warnings: bool = True, error_handling: Literal['raise', 'coerce', 'ignore'] = 'raise') Dataset[source]

Rebase dataset to a date or range of dates.

Parameters:
  • method ({'x13', 'loess', 'mloess', 'moving_averages'}, default 'x13') – Method to use for decomposition.

  • fallback ({'loess', 'mloess', 'moving_averages'}, default 'loess') – Fallback method to use if the main method fails.

  • component ({'t-c', 'sa'}, default 'sa') – Component to return. ‘t-c’ for trend-cycle, ‘sa’ for seasonally adjusted.

  • fn_kwargs (dict, default None) – Additional keyword arguments to pass to the decomposition function.

  • ignore_warnings (bool, default True) – Whether to ignore warnings.

  • error_handling ({"raise", "coerce", "ignore"}, default "raise") – What to do when the input dataset can’t be converted. Coercion will set to np.nan,

Return type:

Dataset

class econuy.base.DatasetMetadata(name: str, indicator_metadata: dict, created_at: datetime | None = None, config: DatasetConfig | None = None)[source]

Bases: object

property indicator_ids: list

Get the list of indicators in the metadata.

Returns:

The list of indicators.

Return type:

list

property has_common_metadata: bool

Check if all indicators have the same metadata.

Returns:

True if all indicators have the same metadata, False otherwise.

Return type:

bool

property common_metadata_dict: dict

Get the common metadata dictionary.

Returns:

The metadata dictionary excluding keys that are not common to all indicators.

Return type:

dict

update_indicator_metadata(indicator: str, single_indicator_metadata: dict) DatasetMetadata[source]

Update the metadata for a specific indicator.

Parameters:
  • indicator (str) – The indicator to update.

  • single_indicator_metadata (dict) – The metadata to update with.

Returns:

The updated metadata.

Return type:

Metadata

update_indicator_metadata_value(indicator: str, key: str, value: str) DatasetMetadata[source]

Update the metadata for a specific indicator.

Parameters:
  • indicator (str) – The indicator to update.

  • key (str) – The key to update.

  • value (str) – The value to update with.

Returns:

The updated metadata.

Return type:

Metadata

update_dataset_metadata(indicator_metadata: dict) DatasetMetadata[source]

Update the metadata for all indicators.

Parameters:

indicator_metadata (dict) – The new metadata to update with.

Returns:

The updated metadata.

Return type:

Metadata

add_transformation_step(transformation: dict) DatasetMetadata[source]

Add a transformation step to the metadata.

Parameters:

transformation (dict) – The transformation step to add.

Returns:

The updated metadata.

Return type:

Metadata

copy() DatasetMetadata[source]

Create a copy of the metadata.

Returns:

The copied metadata.

Return type:

Metadata

to_dict() Dict[source]
save(name: str, data_dir: str | Path | None = None) None[source]
static cast_metadata(indicator_metadata: dict, indicator_ids: list, full_names: list) dict[source]
classmethod from_cast(name: str, base_metadata: dict, indicator_ids: list, indicator_names: list) DatasetMetadata[source]

Create a metadata instance from a casted metadata.

Parameters:
  • base_metadata (dict) – The base metadata.

  • names (list) – The names of the indicators.

  • full_names (list) – The full names of the indicators.

Returns:

The created metadata instance.

Return type:

Metadata

classmethod from_metadatas(name: str, metadatas: List[DatasetMetadata]) DatasetMetadata[source]

Create a metadata instance from a list of metadatas.

Parameters:

metadatas (list) – The list of metadatas.

Returns:

The created metadata instance.

Return type:

Metadata

classmethod from_json(path: str | Path) DatasetMetadata[source]

Create a metadata instance from a JSON file.

Parameters:

path (str or Path) – The path to the JSON file.

Returns:

The created metadata instance.

Return type:

Metadata

class econuy.base.DatasetConfig(name: str)[source]

Bases: object

load() None[source]

Utilities

class econuy.utils.operations.DatasetRegistry[source]

Bases: object

get_multiple(names: List[str]) Dict[source]

Retrieve multiple datasets by their names.

Parameters:

names (List[str]) – A list of dataset names to retrieve.

Returns:

A dictionary containing the requested datasets.

Return type:

dict

get_available() Dict[source]

Retrieve all available datasets that are not disabled.

Returns:

A dictionary containing all available datasets.

Return type:

dict

get_custom() Dict[source]

Retrieve all custom datasets.

Returns:

A dictionary containing all custom datasets.

Return type:

dict

get_by_area(area: str, keep_disabled: bool = False, keep_auxiliary: bool = False) Dict[source]

Retrieve datasets by a specific area, with options to include disabled and auxiliary datasets.

Parameters:
  • area (str) – The area to filter datasets by.

  • keep_disabled (bool, optional) – Whether to include disabled datasets (default is False).

  • keep_auxiliary (bool, optional) – Whether to include auxiliary datasets (default is False).

Returns:

A dictionary containing the datasets that match the specified area and options.

Return type:

dict

list_available() List[str][source]

List the names of all available datasets.

Returns:

A list of names of all available datasets.

Return type:

List[str]

list_custom() List[str][source]

List the names of all custom datasets.

Returns:

A list of names of all custom datasets.

Return type:

List[str]

list_by_area(area: str, keep_disabled: bool = False, keep_auxiliary: bool = False) List[str][source]

List the names of datasets by a specific area, with options to include disabled and auxiliary datasets.

Parameters:
  • area (str) – The area to filter datasets by.

  • keep_disabled (bool, optional) – Whether to include disabled datasets (default is False).

  • keep_auxiliary (bool, optional) – Whether to include auxiliary datasets (default is False).

Returns:

A list of names of datasets that match the specified area and options.

Return type:

List[str]