原文信息:
Prediction and explanation of the formation of the Spanish day-ahead electricity price through machine learning regression原文链接:https://www.sciencedirect.com/science/article/pii/S0306261919302260
HighlightsWe propose a regression-tree-based method for modeling electricity price formation.The explanatory variables are extracted from publicly accessible energy related data.
The energy-related data are free and published by the TSO in a graphical interface.The model shows good accuracy in predicting the price formation. It also allows for a non-linear analysis of the dependence of price on predictors.
导 读
近来,通过回归分析估算未来现货价格的
电力系统状态的详细信息基本仅限于有资质的机构。然而,为了确保运营的透明度,西班牙传输系统运营商已经启动了一个信息网站,其中可以通过图形界面查阅大量的实时能源相关数据。毫无疑问,这为没有资格的各方提供了开发应用程序和算法的机会,这其中价格预测以及价格是如何确定的信息是必需的。 本文探讨了从该界面提取的数据的使用,其目的有两个:以简单的方式预测日前价格,以及探索潜在能源驱动因素对其的影响。对于预测,作者指定了基于梯度Boosted回归树的分位数回归模型。它以更复杂的代价提高了多个线性回归模型的准确度,与其他机器学习方法相比,它仍然具有更简单的规范准则。计算指标表明,当使用中值作为点预测方法时,该模型产生非常低的预测误差(RMSE = 2.78€/ MWh,MAE = 1.94€/ MWh,MAPE = 0.059)。有趣的是,分位数回归模型还允许固有的定义预测区间,具有不同的准确度解释。结果表明,平均90%的预测误差不会超过6.8€/ MWh。
本文还对该模型实施了部分依赖性分析。这种实施 - 据我们所知,第一次用于分析电价的形成 - 已经证明在检测高度非线性关系方面具有重要意义。
AbstractUntil recently, detailed information on the power system state to estimate future spot prices by regression analysis was generally restricted to qualified parties. However, to ensure transparency inoperation, the Spanish Transmission System Operator has launched an informative web in which a sizable amount of real-time energy-related data can be consulted through a graphical interface. Undoubtedly, this provides the opportunity for non-qualified parties to develop applications and algorithms in which price forecast and maybe knowledge about how price is determined are required.This paper approaches the use of data extracted from that interface with two aims: the prediction of the day-ahead price in a simple way, and the exploration of the influence that the underlying energy drivers have on it. For the prediction we specified a quantile regression model based on Gradient Boosted Regression Trees. It improves the accuracy over multiple linear regression models at the cost of more complexity, and still it has simpler specification and tuning compared to other machine learning approaches. The calculated metrics show that our model produces remarkably low prediction errors when using the median as point prediction method (RMSE = 2.78 €/MWh, MAE = 1.94 €/MWh, and MAPE = 0.059). Interestingly, the quantile regression model also allows to inherently define prediction intervals, with a different interpretation of accuracy. Our results show that on average 90% of times the prediction error will not exceed 6.8 €/MWh.We also implemented a partial dependence analysis on that model. This implementation—as far as we know the first time employed to analyze the formation of electricity prices—has shown to be of significant usefulness in detecting highly non-linear relationships.
KeywordsLinear regressionPrincipal componentsQuantile regressionGradient boosting regressionDay-ahead electricity price
Schematics
Fig. 1. Summary of the methodology.
Fig. 2. Pearson correlation between the variables employed in the GBRT and PCR models.
Fig. 8. Variable importance (only the most important, 20 out of 66, are represented) of the percentile 50 prediction model.
Fig. 10. Partial dependence (indeed the deviation, as in Fig. 9) between non-categorical predictors and the predicted day-ahead price. The four categories of predictors described in Section 2.2 are separately plotted. From top to bottom: forecasts, availability of international links, available dispatchable generation, and power generation at 11.00 a.m.
Fig. 11. Partial dependence between coal-based generation and the day-ahead price,depending on the values ahead of 11.00 a.m.