PREVISÃO DE PREÇOS DA SOJA VIA ARQUITETURA HÍBRIDA LSTM-LLM: AVALIAÇÃO ESTATÍSTICA E ECONÔMICA DO SENTIMENTO DE NOTÍCIAS DO AGRONEGÓCIO BRASILEIRO
DOI:
https://doi.org/10.56238/revgeov16n13-049Palavras-chave:
Commodities Agrícolas, Redes Neurais Recorrentes, Análise de Sentimento, Otimização Bayesiana, Model Confidence SetResumo
A soja é a principal commodity agrícola brasileira e a volatilidade de seus preços impõe desafios significativos a produtores, traders e formuladores de políticas públicas, dada a dependência não linear do mercado a fatores exógenos como condições climáticas, políticas comerciais e fluxo informacional de notícias. Objetiva-se investigar em que medida a incorporação de sentimento textual extraído por LLMs especializados no agronegócio aprimora a acurácia preditiva e o valor econômico de modelos LSTM para previsão do contrato futuro de soja (SJCc1). Para tanto, seis arquiteturas foram comparadas empiricamente — benchmark naïve, LSTM pura, LSTM com LLM congelada em saídas escalar e probabilística, e versões end-to-end de ambas — utilizando 3.261 registros de preço e um corpus de 27.024 notícias brasileiras do agronegócio, com fine-tuning sobre 1.000 notícias rotuladas e otimização bayesiana de hiperparâmetros via TPE. A comparação estatística utilizou o procedimento Model Confidence Set (MCS) a 90% de confiança, complementada por teste bootstrap emparelhado em blocos para retorno acumulado. Observa-se que apenas a arquitetura LSTM+LLM com saída probabilística integrou o MCS ao lado do benchmark naïve — sendo a única a gerar retorno acumulado estatisticamente significativo sobre o buy-and-hold (58,27%; p ≈ 0,003; Sharpe ratio: 1,74) —, com vantagem ampliada em períodos de alta volatilidade. Conclui-se que o ganho preditivo decorre da combinação específica entre LLM especializada e codificação probabilística do sentimento, não da integração textual per se.
Downloads
Referências
ALI, Z. et al. CMGM: A novel cross-market assets and multi-market modeling graph neural networks for financial market forecasting leveraging market states dependencies. Alexandria Engineering Journal, [S.l.], v. 128, p. 1101-1124, 2025.
BENGIO, Y.; SIMARD, P.; FRASCONI, P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, [S.l.], v. 5, n. 2, p. 157-166, mar. 1994.
BERGSTRA, J. et al. Algorithms for Hyper-Parameter Optimization. In: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 24., 2011. Anais[...] [S.l.]: Curran Associates, Inc., 2011.
BRASIL. Instituto Brasileiro de Geografia e Estatística – IBGE. IBGE prevê safra de 332,7 milhões de toneladas para 2026, queda de 3,7% frente a 2025. Brasília, DF: IBGE, 13 nov. 2025. Disponível em: https://agenciadenoticias.ibge.gov.br/agencia-sala-de-imprensa/2013-agencia-de-noticias/releases/45124-ibge-preve-safra-de-332-7-milhoes-de-toneladas-para-2026-queda-de-3-7-frente-a-2025. Accessed: Dec. 14, 2025.
CHANDAN, G. Y.; KUMARI, P. Exogenous variable driven cotton prices prediction: comparison of statistical model with sequence based deep learning models. Big Data Research, [S.l.], v. 42, p. 100569, 2025.
FAN, C. et al. Assessment of deep recurrent neural network-based strategies for short-term building energy predictions. Applied Energy, [S.l.], v. 236, p. 700-710, 2019.
FAO – FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS. FAOSTAT: Crops and livestock products. Rome: FAO, 2024. Disponível em: https://www.fao.org/faostat/en/#data/QCL. Accessed: Dec. 14, 2025.
FARIMANI, S. A. et al. Investigating the informativeness of technical indicators and news sentiment in financial market price prediction. Knowledge-Based Systems, [S.l.], v. 247, p. 108742, 2022.
GILARDI, F.; ALIZADEH, M.; KUBLI, M. ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, [S.l.], v. 120, n. 30, p. e2305016120, 2023.
HANSEN, P. R.; LUNDE, A.; NASON, J. M. The Model Confidence Set. Econometrica, [S.l.], v. 79, n. 2, p. 453-497, 2011.
HOCHREITER, S.; SCHMIDHUBER, J. Long Short-Term Memory. Neural Computation, [S.l.], v. 9, n. 8, p. 1735-1780, nov. 1997.
HUTTER, F.; KOTTHOFF, L.; VANSCHOREN, J. (Ed.). Automated Machine Learning: Methods, Systems, Challenges. Cham: Springer, 2019.
LANDIS, J. R.; KOCH, G. G. The measurement of observer agreement for categorical data. Biometrics, [S.l.], v. 33, n. 1, p. 159-174, 1977.
LIANG, C. et al. Climate policy uncertainty and world renewable energy index volatility forecasting. Technological Forecasting and Social Change, [S.l.], v. 182, p. 121810, 2022.
LIAO, M. et al. Improving the model robustness of flood hazard mapping based on hyperparameter optimization of random forest. Expert Systems with Applications, [S.l.], v. 241, p. 122682, 2024.
MENDOZA, C.; KRISTJANPOLLER, W.; MINUTOLO, M. C. Market index price prediction using Deep Neural Networks with a Self-Similarity approach. Applied Soft Computing, [S.l.], v. 146, p. 110700, 2023.
MU, Z. et al. Exploring financial sentiment analysis via fine-tuning large language model and attributed graph neural network. Neural Networks, [S.l.], v. 199, p. 108620, 2026.
POLITIS, D. N.; ROMANO, J. P. The stationary bootstrap. Journal of the American Statistical Association, [S.l.], v. 89, n. 428, p. 1303-1313, 1994.
PUCHALSKY, W. et al. Agribusiness time series forecasting using Wavelet neural networks and metaheuristic optimization: An analysis of the soybean sack price and perishable products demand. International Journal of Production Economics, [S.l.], v. 203, p. 174-189, 2018.
RAIAAN, M. A. K. et al. A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges. IEEE Access, [S.l.], v. 12, p. 26839-26874, 2024.
RAY, S. et al. An ARIMA-LSTM model for predicting volatile agricultural price series with random forest technique. Applied Soft Computing, [S.l.], v. 149, p. 110939, 2023.
SCHUSTER, M.; PALIWAL, K. K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, [S.l.], v. 45, n. 11, p. 2673-2681, nov. 1997.
SONG, Y. et al. Multi-decomposition in deep learning models for futures price prediction. Expert Systems with Applications, [S.l.], v. 246, p. 123171, 2024.
WANG, B.; WANG, J. Deep multi-hybrid forecasting system with random EWT extraction and variational learning rate algorithm for crude oil futures. Expert Systems with Applications, [S.l.], v. 161, p. 113686, 2020.
WANG, K. et al. Short-term electricity price forecasting based on similarity day screening, two-layer decomposition technique and Bi-LSTM neural network. Applied Soft Computing, [S.l.], v. 136, p. 110018, 2023.
ZHANG, D. et al. Prediction of soybean price in China using QR-RBF neural network model. Computers and Electronics in Agriculture, [S.l.], v. 154, p. 10-17, 2018.
ZHANG, F.; XIA, Y. Carbon price prediction models based on online news information analytics. Finance Research Letters, [S.l.], v. 46, p. 102809, 2022.
ZHANG, M. et al. Convolutional Neural Networks-Based Lung Nodule Classification: A Surrogate-Assisted Evolutionary Algorithm for Hyperparameter Optimization. IEEE Transactions on Evolutionary Computation, [S.l.], v. 25, n. 5, p. 869-882, 2021.
ZHANG, Y.; DONG, Z.; XU, W. Integrative stock price trend prediction via hierarchical LLM text processing and patch-based transformer with co-attention. Expert Systems with Applications, [S.l.], v. 302, p. 130441, 2026.
ZHOU, N. et al. Enhancing photovoltaic power prediction using a CNN-LSTM-attention hybrid model with Bayesian hyperparameter optimization. Global Energy Interconnection, [S.l.], v. 7, n. 5, p. 667-681, 2024.
ZHU, M. et al. Energy price prediction based on decomposed price dynamics: A parallel neural network approach. Applied Soft Computing, [S.l.], v. 164, p. 111972, 2024.