简体   繁体   中英

Fastest way to convert object to string Python

I have a ~250,000 row dataset I am pulling from Oracle DB. The main data I am concerned with is a text field that is pulled as a HUGECLOB object. Storing my data to a CSV took quite a bit of time, so I decided to switch to Feather via: Pandas read_csv speed up There was some other question that used feather too, but I cannot find it. I tried to_hdf , but for some reason that did not work. Either way, to get my query results to work with Feather, the text column must be converted to string. So I have:

SQLquery = ('SELECT*')
datai = pd.read_sql(SQLquery, conn)
print("Query Passed, start date")
datai['REPORTDATE'] = pd.to_datetime(datai['REPORTDATE'], format='%m-%d-%Y')
print("Row done, string")
datai['LOWER(LD.LDTEXT)'] = datai['LOWER(LD.LDTEXT)'].apply(str)
print("Data Retrieved")
print("To feather start")
datai.to_feather(r'C:\Users\asdean\Documents\Nuclear Text\dataStore\rawData2.feather')
print("Done with feather")

Note: I put a bunch of print statements because I was trying to figure out where it got hung up.

The text column is identified as datai['LOWER(LD.LDTEXT)'] . Some of the rows contain quite a bit of text (~ a couple paragraphs). The string conversion TAKES FOREVER (or may not even be completing). I did not have this problem when I read it from an old CSV (old data, no longer use, we updated the query, etc).

I have tried all the common ways of doing this with astype(str) , map(str) , apply(str) , values.astype(str) with no success of speeding it up to a reasonable (less than 1 hour) pace. Is there a way to do faster object to string conversions? Is there a library I am missing here? Is there a faster way through Oracle/HUGECLOB? How can I speed this up?

I'm assuming the data is stored in the DB as the Oracle CLOB datatype. How big is the data? If the strings are < 1GB make sure they are being fetched as strings (not streamed) in the lower layer https://cx-oracle.readthedocs.io/en/latest/user_guide/lob_data.html#fetching-lobs-as-strings-and-bytes

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM