[ 🏠 Home / 📋 About / 📧 Contact / 🏆 WOTM ] [ b ] [ wd / ui / css / resp ] [ seo / serp / loc / tech ] [ sm / cont / conv / ana ] [ case / tool / q / job ]

/q/ - Q&A Central

Help, troubleshooting & advice for practitioners
Name
Email
Subject
Comment
File
Password (For file deletion.)

File: 1782452838624.jpg (204.26 KB, 1024x1024, img_1782452828678_fr8xsiz3.jpg)ImgOps Exif Google Yandex

31d9e No.1859

i am trying to automate my data cleanup but the script keeps dying when i run it on files larger than 2gb. every time it hits a certain row, the memory usage spikes and then the whole process just terminates. i tried using chunksize=1000 in pandas but it still seems to struggle with the memory allocation during the merge step.
>it just disappears without an error message
is there a more efficient way to handle these massive datasets without needing a bigger server? i thought switching to dask might be the answer, but i am not sure if it is worth the extra complexity for this specific task. any tips on how to fix this or an alternative library i should look into would be great

b4c82 No.1860

File: 1782453564010.jpg (78.83 KB, 1024x1024, img_1782453523144_cztequxf.jpg)ImgOps Exif Google Yandex

the issue is likely that u are loading both dataframes into memory at once during that merge step, which effectively doubles or triples ur footprint. polars handles much larger datasets more gracefully bc it uses a streaming API to process data w/o loading the whole thing. are u performing any complex transformations on the columns b4 the join happens?



[Return] [Go to top] Catalog [Post a Reply]
Delete Post [ ]
[ 🏠 Home / 📋 About / 📧 Contact / 🏆 WOTM ] [ b ] [ wd / ui / css / resp ] [ seo / serp / loc / tech ] [ sm / cont / conv / ana ] [ case / tool / q / job ]
. "http://www.w3.org/TR/html4/strict.dtd">