API documentation
Loading datasets
- econuy.load.load_dataset(name: str, data_dir: str | Path | None = None, skip_cache: bool = False, force_overwrite: bool = False, skip_update: bool = False) Dataset[source]
Load a dataset by name, optionally skipping cache and forcing overwrite.
- Parameters:
name (str) – The name of the dataset to load.
data_dir (Union[str, Path, None], optional) – The directory where the dataset is stored or will be stored. If None, the default data directory is used. Default is None.
skip_cache (bool, optional) – If True, the cache will be skipped and a new dataset will be retrieved. Default is False.
force_overwrite (bool, optional) – If True, the existing dataset will be overwritten. Default is False.
skip_update (bool, optional) – If True, the dataset will not be updated if it already exists. Default is False.
- Returns:
The loaded dataset.
- Return type:
- Raises:
ValueError – If the dataset name is not available in the registry.
AssertionError – If the existing dataset has changed and force_overwrite is False.
- econuy.load.load_datasets_parallel(names: List[str], data_dir: str | Path | None = None, skip_cache: bool = False, force_overwrite: bool = False, skip_update: bool = False, max_workers: int | None = None, executor_type: Literal['thread', 'process'] = 'thread') Dict[str, Dataset][source]
Load multiple datasets in parallel using either threading or multiprocessing.
- Parameters:
names (List[str]) – List of dataset names to load.
data_dir (Union[str, Path, None], optional) – Directory where datasets are stored. If None, a default directory is used.
skip_cache (bool, optional) – If True, skip loading from cache. Default is False.
force_overwrite (bool, optional) – If True, force overwrite existing datasets. Default is False.
skip_update (bool, optional) – If True, skip updating datasets that already exist. Default is False.
max_workers (Optional[int], optional) – Maximum number of workers to use for parallel loading. If None, it will use the default number of workers.
executor_type (Literal["thread", "process"], optional) – Type of executor to use for parallel loading. Can be “thread” for ThreadPoolExecutor or “process” for ProcessPoolExecutor. Default is “thread”.
- Returns:
A dictionary where keys are dataset names and values are the loaded datasets.
- Return type:
Dict[str, Dataset]
- Raises:
Exception – If there is an error loading any of the datasets, it will be printed and the dataset will be skipped.
Working with datasets
- class econuy.base.Dataset(name: str, data: DataFrame, metadata: DatasetMetadata, transformed: bool = False)[source]
Bases:
objectA class to represent a collection of economic data.
- Parameters:
data (pd.DataFrame) – The economic data.
metadata (Metadata) – The metadata of the data.
name (str) – The name of the dataset.
- Return type:
None
See also
pd.DataFrame,Metadata- validate() None[source]
Validate the dataset.
- Raises:
AssertionError – If the number of indicators does not match the number of columns in the data. If any of the indicators are not in the data. If the index of the data is not a DatetimeIndex. If the data contains non-numeric values.
- to_detailed(language: str = 'es') DataFrame[source]
Rename the data using the metadata.
- Parameters:
language (str, default "es") – The language to use for the metadata.
- Returns:
The data with the indicators renamed.
- Return type:
pd.DataFrame
- to_named(language: str = 'es') DataFrame[source]
Rename the data using the metadata.
- Parameters:
language (str, default "es") – The language to use for the metadata.
- Returns:
The data with the indicators renamed.
- Return type:
pd.DataFrame
- to_json() dict[source]
Convert the dataset to a valid JSON dictionary.
- Returns:
A JSON representation of the dataset.
- Return type:
dict
- save(data_dir: str | Path | None = None, name: str | None = None) None[source]
Save the dataset to a directory.
- Parameters:
data_dir (str or Path) – The directory to save the dataset to.
name (str, default None) – The name to save the dataset as without suffixes.
- Return type:
None
- infer_frequency() Timedelta | None[source]
Infer the frequency of the data.
- Returns:
The inferred frequency of the data.
- Return type:
Optional[pd.Timedelta]
- select(ids: str | List[str] | None = None, names: str | List[str] | None = None, language: str = 'es') Dataset[source]
- filter(start_date: str | datetime | None = None, end_date: str | datetime | None = None) Dataset[source]
- resample(rule: DateOffset | Timedelta | str, operation: Literal['sum', 'mean', 'last', 'upsample'] = 'sum', interpolation: str = 'linear') Dataset[source]
Wrapper for the resample method in Pandas that integrates with econuy dataframes’ metadata.
Trim partial bins, i.e. do not calculate the resampled period if it is not complete, unless the input dataframe has no defined frequency, in which case no trimming is done.
- Parameters:
rule (pd.DateOffset, pd.Timedelta or str) – Target frequency to resample to. See Pandas offset aliases
operation ({'sum', 'mean', 'last', 'upsample'}) – Operation to use for resampling.
interpolation (str, default 'linear') – Method to use when missing data are produced as a result of resampling, for example when upsampling to a higher frequency. See Pandas interpolation methods
- Return type:
Dataset- Raises:
ValueError – If
operationis not one of available options.ValueError – If the input dataframe’s columns do not have the appropiate levels.
- Warns:
UserWarning – If input frequencies cannot be assigned a numeric value, preventing incomplete bin trimming.
- rolling(window: int, operation: Literal['sum', 'mean'] = 'sum') Dataset[source]
Wrapper for the rolling method in Pandas that integrates with econuy dataframes’ metadata.
If
periodsisNone, try to infer the frequency and setperiodsaccording to the following logic:{'YE-DEC': 1, 'QE-DEC': 4, 'ME': 12}, that is, each period will be calculated as the sum or mean of the last year.- Parameters:
window (int, default None) – How many periods the window should cover.
operation ({'sum', 'mean'}) – Operation used to calculate rolling windows.
- Return type:
Dataset- Raises:
ValueError – If
operationis not one of available options.ValueError – If the input dataframe’s columns do not have the appropiate levels.
- Warns:
UserWarning – If the input dataframe is a stock time series, for which rolling operations are not recommended.
- chg_diff(operation: Literal['chg', 'diff'] = 'chg', period: Literal['last', 'inter', 'annual'] = 'last') Dataset[source]
Wrapper for the pct_change and diff Pandas methods.
Calculate percentage change or difference for dataframes. The
periodargument takes into account the frequency of the dataframe, i.e.,inter(for interannual) will calculate pct change/differences withperiods=4for quarterly frequency, butperiods=12for monthly frequency.- Parameters:
operation ({'chg', 'diff'}) –
chgfor percent change ordifffor differences.period ({'last', 'inter', 'annual'}) – Period with which to calculate change or difference.
lastfor previous period (last month for monthly data),interfor same period last year,annualfor same period last year but taking annual sums.
- Return type:
Dataset- Raises:
ValueError – If the dataframe is not of frequency
ME(month),QEorQE-DEC(quarter), orYEorYE-DEC(year).ValueError – If the
operationparameter does not have a valid argument.ValueError – If the
periodparameter does not have a valid argument.ValueError – If the input dataframe’s columns do not have the appropiate levels.
- rebase(start_date: str | datetime, end_date: str | datetime | None = None, base: float = 100.0) Dataset[source]
Rebase dataset to a date or range of dates.
- Parameters:
start_date (string or datetime.datetime) – Date to which series will be rebased.
end_date (string or datetime.datetime, default None) – If specified, series will be rebased to the average between
start_dateandend_date.base (float, default 100) – Float for which
start_date==baseor average betweenstart_dateandend_date==base.
- Return type:
Dataset
- convert(flavor: Literal['usd', 'real', 'gdp'], start_date: str | datetime | None = None, end_date: str | datetime | None = None, error_handling: Literal['raise', 'coerce', 'ignore'] = 'raise') Dataset[source]
Convert dataset from UYU to USD, from UYU to real UYU or from UYU/USD to % GDP.
flavor=usd: Convert a dataset from Uruguayan pesos to US dollars. Takes into account whether the input datasets is flow or stock, in order to choose end of period or monthly average NXR. Also take into account the input dataframe’s frequency and whether columns represent rolling averages or sums.flavor=real: Convert a dataset columns to real prices. Takes into account the input datasets’s frequency and whether columns represent rolling averages or sums. Allow choosing a single period, a range of dates or no period as a base (i.e., period for which the average/sum of input dataframe and output dataframe is the same).flavor=gdp: Convert a dataset to percentage of GDP. Takes into account the input dataset’s currency for chossing UYU or USD GDP. If frequency of input dataset is higher than quarterly, GDP will be upsampled and linear interpolation will be performed to complete missing data. If input dataframe’s “cumulative_periods” level is not 12 for monthly frequency or 4 for quarterly frequency, calculate rolling input dataframe.In all cases, if input dataframe’s frequency is higher than monthly (daily, business, etc.), resample to monthly frequency.
- Parameters:
flavor (str) –
usdfor USD,realfor real UYU,gdpfor % GDP.start_date (str, datetime.date or None, default None) – Only used if
flavor=real. If set to a date-like string or a date, andend_dateis None, the base period will bestart_date.end_date (str, datetime.date or None, default None) – Only used if
flavor=real. Ifstart_dateis set, calculate so that the data is in constant prices ofstart_date-end_date.error_handling ({"raise", "coerce", "ignore"}, default "raise") – What to do when the input dataset can’t be converted. Coercion will set to np.nan, while “ignore” is a no-op. If “raise”, will raise an error.
- Return type:
Dataset
- decompose(method: Literal['x13', 'loess', 'mloess', 'moving_averages'] = 'x13', fallback: Literal['loess', 'mloess', 'moving_averages'] = 'loess', component: Literal['t-c', 'sa'] = 'sa', fn_kwargs: dict | None = None, ignore_warnings: bool = True, error_handling: Literal['raise', 'coerce', 'ignore'] = 'raise') Dataset[source]
Rebase dataset to a date or range of dates.
- Parameters:
method ({'x13', 'loess', 'mloess', 'moving_averages'}, default 'x13') – Method to use for decomposition.
fallback ({'loess', 'mloess', 'moving_averages'}, default 'loess') – Fallback method to use if the main method fails.
component ({'t-c', 'sa'}, default 'sa') – Component to return. ‘t-c’ for trend-cycle, ‘sa’ for seasonally adjusted.
fn_kwargs (dict, default None) – Additional keyword arguments to pass to the decomposition function.
ignore_warnings (bool, default True) – Whether to ignore warnings.
error_handling ({"raise", "coerce", "ignore"}, default "raise") – What to do when the input dataset can’t be converted. Coercion will set to np.nan,
- Return type:
Dataset
- class econuy.base.DatasetMetadata(name: str, indicator_metadata: dict, created_at: datetime | None = None, config: DatasetConfig | None = None)[source]
Bases:
object- property indicator_ids: list
Get the list of indicators in the metadata.
- Returns:
The list of indicators.
- Return type:
list
- property has_common_metadata: bool
Check if all indicators have the same metadata.
- Returns:
True if all indicators have the same metadata, False otherwise.
- Return type:
bool
- property common_metadata_dict: dict
Get the common metadata dictionary.
- Returns:
The metadata dictionary excluding keys that are not common to all indicators.
- Return type:
dict
- update_indicator_metadata(indicator: str, single_indicator_metadata: dict) DatasetMetadata[source]
Update the metadata for a specific indicator.
- Parameters:
indicator (str) – The indicator to update.
single_indicator_metadata (dict) – The metadata to update with.
- Returns:
The updated metadata.
- Return type:
Metadata
- update_indicator_metadata_value(indicator: str, key: str, value: str) DatasetMetadata[source]
Update the metadata for a specific indicator.
- Parameters:
indicator (str) – The indicator to update.
key (str) – The key to update.
value (str) – The value to update with.
- Returns:
The updated metadata.
- Return type:
Metadata
- update_dataset_metadata(indicator_metadata: dict) DatasetMetadata[source]
Update the metadata for all indicators.
- Parameters:
indicator_metadata (dict) – The new metadata to update with.
- Returns:
The updated metadata.
- Return type:
Metadata
- add_transformation_step(transformation: dict) DatasetMetadata[source]
Add a transformation step to the metadata.
- Parameters:
transformation (dict) – The transformation step to add.
- Returns:
The updated metadata.
- Return type:
Metadata
- copy() DatasetMetadata[source]
Create a copy of the metadata.
- Returns:
The copied metadata.
- Return type:
Metadata
- classmethod from_cast(name: str, base_metadata: dict, indicator_ids: list, indicator_names: list) DatasetMetadata[source]
Create a metadata instance from a casted metadata.
- Parameters:
base_metadata (dict) – The base metadata.
names (list) – The names of the indicators.
full_names (list) – The full names of the indicators.
- Returns:
The created metadata instance.
- Return type:
Metadata
- classmethod from_metadatas(name: str, metadatas: List[DatasetMetadata]) DatasetMetadata[source]
Create a metadata instance from a list of metadatas.
- Parameters:
metadatas (list) – The list of metadatas.
- Returns:
The created metadata instance.
- Return type:
Metadata
- classmethod from_json(path: str | Path) DatasetMetadata[source]
Create a metadata instance from a JSON file.
- Parameters:
path (str or Path) – The path to the JSON file.
- Returns:
The created metadata instance.
- Return type:
Metadata
Utilities
- class econuy.utils.operations.DatasetRegistry[source]
Bases:
object- get_multiple(names: List[str]) Dict[source]
Retrieve multiple datasets by their names.
- Parameters:
names (List[str]) – A list of dataset names to retrieve.
- Returns:
A dictionary containing the requested datasets.
- Return type:
dict
- get_available() Dict[source]
Retrieve all available datasets that are not disabled.
- Returns:
A dictionary containing all available datasets.
- Return type:
dict
- get_custom() Dict[source]
Retrieve all custom datasets.
- Returns:
A dictionary containing all custom datasets.
- Return type:
dict
- get_by_area(area: str, keep_disabled: bool = False, keep_auxiliary: bool = False) Dict[source]
Retrieve datasets by a specific area, with options to include disabled and auxiliary datasets.
- Parameters:
area (str) – The area to filter datasets by.
keep_disabled (bool, optional) – Whether to include disabled datasets (default is False).
keep_auxiliary (bool, optional) – Whether to include auxiliary datasets (default is False).
- Returns:
A dictionary containing the datasets that match the specified area and options.
- Return type:
dict
- list_available() List[str][source]
List the names of all available datasets.
- Returns:
A list of names of all available datasets.
- Return type:
List[str]
- list_custom() List[str][source]
List the names of all custom datasets.
- Returns:
A list of names of all custom datasets.
- Return type:
List[str]
- list_by_area(area: str, keep_disabled: bool = False, keep_auxiliary: bool = False) List[str][source]
List the names of datasets by a specific area, with options to include disabled and auxiliary datasets.
- Parameters:
area (str) – The area to filter datasets by.
keep_disabled (bool, optional) – Whether to include disabled datasets (default is False).
keep_auxiliary (bool, optional) – Whether to include auxiliary datasets (default is False).
- Returns:
A list of names of datasets that match the specified area and options.
- Return type:
List[str]