in 2026 things got a bit more interesting for db admins out there. oracle machine learning now supports vectorizing records via pca, which is awesome because it opens up clustering and similarity searches on your datasets ⚡
the catch? these algorithms struggle when you toss in some text-heavy columns like customer reviews or descriptions does anyone else run into this issue regularly?
ive been experimenting with a workaround by pre-processing natural language fields to fit better within the vector model. tried stemming, lemmatization - tons of stuff - but none felt perfect yet
any tips on how you guys handle these mixed datasets?
https://dzone.com/articles/similarity-search-tabular-data-natural-language-fields