PREVISÃO DE PREÇOS DA SOJA VIA ARQUITETURA HÍBRIDA LSTM-LLM: AVALIAÇÃO ESTATÍSTICA E ECONÔMICA DO SENTIMENTO DE NOTÍCIAS DO AGRONEGÓCIO BRASILEIRO

Autores

  • Marco Antonio França Benjamim
  • Samuel Bellido Rodrigues
  • Lucas da Silva Ribeiro
  • Levi Lopes Teixeira
  • Tasia Hickmann
  • Jairo Marlon Correa

DOI:

https://doi.org/10.56238/revgeov16n13-049

Palavras-chave:

Commodities Agrícolas, Redes Neurais Recorrentes, Análise de Sentimento, Otimização Bayesiana, Model Confidence Set

Resumo

A soja é a principal commodity agrícola brasileira e a volatilidade de seus preços impõe desafios significativos a produtores, traders e formuladores de políticas públicas, dada a dependência não linear do mercado a fatores exógenos como condições climáticas, políticas comerciais e fluxo informacional de notícias. Objetiva-se investigar em que medida a incorporação de sentimento textual extraído por LLMs especializados no agronegócio aprimora a acurácia preditiva e o valor econômico de modelos LSTM para previsão do contrato futuro de soja (SJCc1). Para tanto, seis arquiteturas foram comparadas empiricamente — benchmark naïve, LSTM pura, LSTM com LLM congelada em saídas escalar e probabilística, e versões end-to-end de ambas — utilizando 3.261 registros de preço e um corpus de 27.024 notícias brasileiras do agronegócio, com fine-tuning sobre 1.000 notícias rotuladas e otimização bayesiana de hiperparâmetros via TPE. A comparação estatística utilizou o procedimento Model Confidence Set (MCS) a 90% de confiança, complementada por teste bootstrap emparelhado em blocos para retorno acumulado. Observa-se que apenas a arquitetura LSTM+LLM com saída probabilística integrou o MCS ao lado do benchmark naïve — sendo a única a gerar retorno acumulado estatisticamente significativo sobre o buy-and-hold (58,27%; p ≈ 0,003; Sharpe ratio: 1,74) —, com vantagem ampliada em períodos de alta volatilidade. Conclui-se que o ganho preditivo decorre da combinação específica entre LLM especializada e codificação probabilística do sentimento, não da integração textual per se.

Downloads

Os dados de download ainda não estão disponíveis.

Referências

ALI, Z. et al. CMGM: A novel cross-market assets and multi-market modeling graph neural networks for financial market forecasting leveraging market states dependencies. Alexandria Engineering Journal, [S.l.], v. 128, p. 1101-1124, 2025.

BENGIO, Y.; SIMARD, P.; FRASCONI, P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, [S.l.], v. 5, n. 2, p. 157-166, mar. 1994.

BERGSTRA, J. et al. Algorithms for Hyper-Parameter Optimization. In: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 24., 2011. Anais[...] [S.l.]: Curran Associates, Inc., 2011.

BRASIL. Instituto Brasileiro de Geografia e Estatística – IBGE. IBGE prevê safra de 332,7 milhões de toneladas para 2026, queda de 3,7% frente a 2025. Brasília, DF: IBGE, 13 nov. 2025. Disponível em: https://agenciadenoticias.ibge.gov.br/agencia-sala-de-imprensa/2013-agencia-de-noticias/releases/45124-ibge-preve-safra-de-332-7-milhoes-de-toneladas-para-2026-queda-de-3-7-frente-a-2025. Accessed: Dec. 14, 2025.

CHANDAN, G. Y.; KUMARI, P. Exogenous variable driven cotton prices prediction: comparison of statistical model with sequence based deep learning models. Big Data Research, [S.l.], v. 42, p. 100569, 2025.

FAN, C. et al. Assessment of deep recurrent neural network-based strategies for short-term building energy predictions. Applied Energy, [S.l.], v. 236, p. 700-710, 2019.

FAO – FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS. FAOSTAT: Crops and livestock products. Rome: FAO, 2024. Disponível em: https://www.fao.org/faostat/en/#data/QCL. Accessed: Dec. 14, 2025.

FARIMANI, S. A. et al. Investigating the informativeness of technical indicators and news sentiment in financial market price prediction. Knowledge-Based Systems, [S.l.], v. 247, p. 108742, 2022.

GILARDI, F.; ALIZADEH, M.; KUBLI, M. ChatGPT outperforms crowd workers for text-annotation tasks. Proceedings of the National Academy of Sciences, [S.l.], v. 120, n. 30, p. e2305016120, 2023.

HANSEN, P. R.; LUNDE, A.; NASON, J. M. The Model Confidence Set. Econometrica, [S.l.], v. 79, n. 2, p. 453-497, 2011.

HOCHREITER, S.; SCHMIDHUBER, J. Long Short-Term Memory. Neural Computation, [S.l.], v. 9, n. 8, p. 1735-1780, nov. 1997.

HUTTER, F.; KOTTHOFF, L.; VANSCHOREN, J. (Ed.). Automated Machine Learning: Methods, Systems, Challenges. Cham: Springer, 2019.

LANDIS, J. R.; KOCH, G. G. The measurement of observer agreement for categorical data. Biometrics, [S.l.], v. 33, n. 1, p. 159-174, 1977.

LIANG, C. et al. Climate policy uncertainty and world renewable energy index volatility forecasting. Technological Forecasting and Social Change, [S.l.], v. 182, p. 121810, 2022.

LIAO, M. et al. Improving the model robustness of flood hazard mapping based on hyperparameter optimization of random forest. Expert Systems with Applications, [S.l.], v. 241, p. 122682, 2024.

MENDOZA, C.; KRISTJANPOLLER, W.; MINUTOLO, M. C. Market index price prediction using Deep Neural Networks with a Self-Similarity approach. Applied Soft Computing, [S.l.], v. 146, p. 110700, 2023.

MU, Z. et al. Exploring financial sentiment analysis via fine-tuning large language model and attributed graph neural network. Neural Networks, [S.l.], v. 199, p. 108620, 2026.

POLITIS, D. N.; ROMANO, J. P. The stationary bootstrap. Journal of the American Statistical Association, [S.l.], v. 89, n. 428, p. 1303-1313, 1994.

PUCHALSKY, W. et al. Agribusiness time series forecasting using Wavelet neural networks and metaheuristic optimization: An analysis of the soybean sack price and perishable products demand. International Journal of Production Economics, [S.l.], v. 203, p. 174-189, 2018.

RAIAAN, M. A. K. et al. A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges. IEEE Access, [S.l.], v. 12, p. 26839-26874, 2024.

RAY, S. et al. An ARIMA-LSTM model for predicting volatile agricultural price series with random forest technique. Applied Soft Computing, [S.l.], v. 149, p. 110939, 2023.

SCHUSTER, M.; PALIWAL, K. K. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, [S.l.], v. 45, n. 11, p. 2673-2681, nov. 1997.

SONG, Y. et al. Multi-decomposition in deep learning models for futures price prediction. Expert Systems with Applications, [S.l.], v. 246, p. 123171, 2024.

WANG, B.; WANG, J. Deep multi-hybrid forecasting system with random EWT extraction and variational learning rate algorithm for crude oil futures. Expert Systems with Applications, [S.l.], v. 161, p. 113686, 2020.

WANG, K. et al. Short-term electricity price forecasting based on similarity day screening, two-layer decomposition technique and Bi-LSTM neural network. Applied Soft Computing, [S.l.], v. 136, p. 110018, 2023.

ZHANG, D. et al. Prediction of soybean price in China using QR-RBF neural network model. Computers and Electronics in Agriculture, [S.l.], v. 154, p. 10-17, 2018.

ZHANG, F.; XIA, Y. Carbon price prediction models based on online news information analytics. Finance Research Letters, [S.l.], v. 46, p. 102809, 2022.

ZHANG, M. et al. Convolutional Neural Networks-Based Lung Nodule Classification: A Surrogate-Assisted Evolutionary Algorithm for Hyperparameter Optimization. IEEE Transactions on Evolutionary Computation, [S.l.], v. 25, n. 5, p. 869-882, 2021.

ZHANG, Y.; DONG, Z.; XU, W. Integrative stock price trend prediction via hierarchical LLM text processing and patch-based transformer with co-attention. Expert Systems with Applications, [S.l.], v. 302, p. 130441, 2026.

ZHOU, N. et al. Enhancing photovoltaic power prediction using a CNN-LSTM-attention hybrid model with Bayesian hyperparameter optimization. Global Energy Interconnection, [S.l.], v. 7, n. 5, p. 667-681, 2024.

ZHU, M. et al. Energy price prediction based on decomposed price dynamics: A parallel neural network approach. Applied Soft Computing, [S.l.], v. 164, p. 111972, 2024.

Downloads

Publicado

2026-06-08

Como Citar

Benjamim, M. A. F., Rodrigues, S. B., Ribeiro, L. da S., Teixeira, L. L., Hickmann, T., & Correa, J. M. (2026). PREVISÃO DE PREÇOS DA SOJA VIA ARQUITETURA HÍBRIDA LSTM-LLM: AVALIAÇÃO ESTATÍSTICA E ECONÔMICA DO SENTIMENTO DE NOTÍCIAS DO AGRONEGÓCIO BRASILEIRO. Revista De Geopolítica, 17(6), e2591. https://doi.org/10.56238/revgeov16n13-049