简体   繁体   中英

Is there a way to create a single column pandas DataFrame from list without copying the list?

Suppose I have this code:

import pandas as pd

mylist = [item for item in range(100000)]
df = pd.DataFrame()
df["col1"] = mylist

Is the data in mylist copied when it is assigned to df["col1"] ? If so, is there a way to avoid this copy?

Edit: My list in this case is a list of strings. One things I am getting from these answers is if I instead create a numpy array of these strings, no data duplication will occur I call df["col1"] = mynparray ?

When you assign your list to a series, a new NumPy array is created. This data structure permits vectorised computations for numeric types. Such series are laid out in contiguous memory blocks. See Why NumPy instead of Python lists? for more details.

Therefore, you will need enough memory to hold duplicate data. This is unavoidable. There is no way to "convert" a list into a Pandas series in place.

Note : the above does not relate to what happens when you assign a NumPy array to a series.

just a thought - can you remove a list after creating df , if memory is a concern?

import pandas as pd
mylist = [item for item in range(100000)]
df = pd.Series(mylist).to_frame()
del mylist

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM