High costs in health care and everlasting need for quality improvement in care delivery is increasingly becoming the motivating factor for novel predictive studies in health care informatics. Surgical services impact both the operating theatre costs and revenues and play critical role in care quality. Efficiency of such units relies extremely on effective operational planning and inventory management. A key ingredient to such planning activities is the structured and unstructured data available prior to the surgery day from the electronic health records and other information systems. Unstructured data, such as textual features of procedure description and notes, provide additional information while structured data alone is not sufficient. To effectively utilize textual information using text mining, textual features should be easily identifiable, i.e. without typographical errors and ad hoc abbreviations. While there exists numerous spelling correction and abbreviation identification tools, they are not suitable for the surgical medical text as they require a dictionary and cannot accommodate ad hoc words such as abbreviations. This project proposes a novel preprocessing framework for surgical text data to detect misspellings and abbreviations prior to the application of any text mining and predictive modeling. The proposed approach helps extract the most salient text features from the unstructured principal procedure and additional notes by effectively reducing the raw feature set dimension. The transformed (text) feature set thus improves subsequent prediction tasks in surgery units. We test and validate the proposed approach using datasets from multiple hospitals’ surgical departments and benchmark feature sets.