demand_acep package¶
Submodules¶
demand_acep.demand_acep module¶
This module contains code for the demand_acep. More documentation to come.
This is a test docstring for the whole module
-
demand_acep.demand_acep.
build_interpolation
(y_values, n_val)[source]¶ This function takes performs the actual 1-d interpolation. If the number of consecutive missing points is less than 3, a linear interpolation is used, else, a cubic interpolation is used.
Parameters: - y_values :
y_values are the values on which the function interpolation is built, that is, y_values = f(x).
- n_val :
n_val is the number of consecutive missing points that needs to be filled.
Returns: - y_interp
Array of interpolated values equal in length to the missing supplied length (n_val) of missing data points..
-
demand_acep.demand_acep.
compute_interpolation
(df)[source]¶ This function imputes missing measurement data (Nan) in a series using 1-d interpolation.
Parameters: - df :
df is a series containing missing measurements values.
Returns: - Series
Filled pandas series with no missing values.
-
demand_acep.demand_acep.
data_impute
(impute_df)[source]¶ This function imputes missing measurement in a dataframe using a 1-d interpolation. If the number of consecutive missing points is less than 3, a linear interpolation is used, else, a cubic interpolation is used.
Parameters: - impute_df :
impute_df can either be a dataframe of a dictionary of dataframes containing missing measurements values.
Returns: - Dataframe
Filled pandas dataframe with no missing values.
-
demand_acep.demand_acep.
data_resample
(df, sample_time='1T')[source]¶ This function downsamples a sample-time indexed pandas dataframe containing measurement channel values based on the sample time supplied. It uses the mean of the values within the resolution interval. It uses the pandas dataframe module df.resample
Parameters: - df :
df is a sample-time indexed pandas dataframe containing measurement values from the different channels of each meter.
- sample_time :
sample_time determines the desired resolution of the downsampled data. For 1 minute - 1T, 1 hour - 1H, 1 month - 1M, 1 Day - 1D etc. The default chosen here is 1 minute.
Returns: - Dataframe
Resampled-time indexed pandas dataframe containing downsampled measurement values from the given dataframe.
-
demand_acep.demand_acep.
extract_data
(dirpath, filename)[source]¶ This function reads and extracts the NetCDF format data of the given meter channel using a package called xarray.
Parameters: - dirpath :
dirpath is the directory path location of the NetCDF file to be read
- filename :
filename is the NetCDF format meter channel file to be read.
Returns: - Dataframe
Sample-time indexed pandas dataframe containing measurement values from the given file.
-
demand_acep.demand_acep.
extract_ppty
(filename, meter_name)[source]¶ This function parses out the given filename as a string to determine the meter name and the measurement channel/type contained in the file
Parameters: - filename :
filename is the NetCDF format meter channel file whose name contains information such as location, date, meter type, measurement channel/type and sampling frequency. An example filename is: ‘PokerFlatResearchRange-PokerFlat-PkFltM1AntEaDel@2018-07-02T081007Z@PT23H@PT146F.nc’
- meter_name :
meter_name is a list containing the names of each of the meters at pokerflats
Returns: - meter : string
meter is the meter name of the NetCDF format file given.
- channel : string
channel is the measurement type contained in the NetCDF format file given
-
demand_acep.demand_acep.
long_missing_data_prep
(dirpath, filename)[source]¶ This function prepares a dataset in a csv format with missing days, months or years for interpolation using the data_impute function. It fills in the missing time as a ‘DateTimeIndex’ and assigns a value of NaN to the missing data points.
Parameters: - dirpath :
dirpath is the directory path location of the csv file containing the missing data points in already down-sampled to a 1-Minute interval.
- filename :
filename is the csv file containing the missing data points to be read.
Returns: - Dataframe
pandas dataframe with ‘DateTimeIndex’ and value of NaN assigned to the missing data points.
This file contains functions that create the schema in the database
-
demand_acep.create_db_schema.
create_schema_from_source_files
(sql_engine, config)[source]¶ This function reads the source file and extracts the table names and corresponding column names.
Parameters: - sql_engine : SQLAlchemy engine
‘sql_engine` should support database operation.
- channel_metadata_file : string
channel_metadata_file represents the absolute path of the channel metadata file including the filename.
- years_file : string
years_file represents the absolute path of the file containing the years for which the tables should be created for. This file should be a text file with one year per row and the column header “years”.
Returns: - List of table names in the database.
The functions in this file convert the netcdf files to a dataframe and then write them to disk using multiprocessing. So, if we have a multi-core system, the extraction can happen in parallel.
-
demand_acep.extract_data_to_csv.
extract_csv_for_date
(config, data_date)[source]¶ This function writes the data for the data_date specified to a csv on disk.
Parameters: - config :
config contains the configuration for paths etc. needed by files. As an argument we can change the config files for different production vs test configs
- data_date : string
data_date string will be used to extract the year and the path to the data. It should be in the ‘mm/dd/yyyy’ format.
Returns: - list
Names of the csv’s written to disk.
Functions in this file insert the data into the timescaledb database using the go utility provided by timescaledb called timescaledb-parallel-copy
-
demand_acep.timescale_parallel_copy.
parallel_copy_data_for_date
(config, data_date)[source]¶ This function parallel copies the data for the data_date specified to the appropriate table of TSDB.
Parameters: - config :
config contains the configuration for paths etc. needed by files. As an argument we can change the config files for different production vs test configs
- data_date : string
data_date string will be used to extract the year and the path to the data. It should be in the ‘mm/dd/yyyy’ format.
Returns: - None - prints the number of rows copied into the database for the particular day.
-
demand_acep.timescale_parallel_copy.
parallel_copy_data_for_dates
(config, start, end)[source]¶ This function extracts the data to disk for the data_date_range specified.
Parameters: - config :
config contains the configuration for paths etc. needed by files. As an argument we can change the config files for different production vs test configs
- start : str or datetime-like
Left bound for generating dates.
- end : str or datetime-like
Right bound for generating dates.
Returns: - None