Extracting textual information from scene images is crucial for understanding the content of a scene, and text detection forms the foundation for text recognition and comprehension. Scene text detection is currently one of the most challenging tasks, receiving increasing attention from researchers.This paper proposes an efficient arbitrary-shaped text detector: the Non-Local Pixel Aggregation Network (NL-PAN). This method utilizes a feature pyramid enhancement module and a feature fusion module for lightweight feature extraction, ensuring speed advantages. It also introduces non-local operations to enhance the feature extraction capability of the backbone network, improving detection accuracy. Non-local operations are an attention mechanism that captures the inherent relationships between text pixels. Additionally, a feature vector fusion module is designed to integrate features from different scales, enhancing the feature representation of scene text instances with varying scales.The proposed method is compared with other methods on three scene text datasets, showing outstanding performance in both speed and accuracy. On the ICDAR 2015 dataset, this method improves the F-score by 1.5% over the best-performing method, achieving a detection speed of 23.1 FPS. On the CTW1500 dataset, the F-score is improved by 1.8% over the best method, with a detection speed of 71.8 FPS. On the Total-Text dataset, the F-score is improved by 0.8%, and the detection speed reaches 64.3 FPS, far surpassing other methods.The proposed method balances accuracy and real-time performance, achieving leading results in both accuracy and speed.