Hi I have a prboblem to convert list of objects to a list of integers . The objects are within the "stopsequence" column of the Pandas data frame "Kanten". All of this I receive after so CSV importing and data cleaning in the column. I am using Python 3.X
I am a Python newbie, maybe that's part of the problem here.
import pandas as pd
import numpy as np
import os
import re
import ast
orgn_csv = pd.read_csv(r"Placeholder path for csv file")
df = orgn_csv.dropna()
Kanten = pd.DataFrame({"stopsequence" : df.stopsequence})
# In between is a block in which I use regular expressions for data cleaning purposes.
# I left the data cleaning block out to make the post shorter
Kanten.stopsequence = Kanten.stopsequence.str.split (',')
print (Kanten.head())
print (Kanten.stopsequence.dtype)
This gives the following output:
stopsequence
2 [67, 945, 123, 122, 996, 995, 80, 81, 184, 990...
3 [67, 945, 123, 122, 996, 995, 80, 81, 184, 990...
4 [67, 945, 123, 122, 996, 995, 80, 81, 184, 990...
5 [67, 945, 123, 122, 996, 995, 80, 81, 184, 990...
6 [67, 945, 123, 122, 996, 995, 80, 81, 184, 990...
object
I am looking for a way to transform the list which contains objects. I searched through the StackOverflow Forum intensively and tried a bunch of different approaches. With none of them I was succesfull. I tryed to use:
Kanten.stopsequence = Kanten.stopsequence.astype(str).astype(int)
This Returns:
ValueError: invalid literal for int() with base 10:
adapted the following post with the use of atoi instead of atof
Kanten.stopsequence.applymap(atoi)
This Returns:
AttributeError: 'Series' object has no attribute 'applymap'
Kanten.stopsequence = list(map(int, Kanten.stopsequence))
This returns:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
Kanten.stopsequence = Kanten.stopsequence.apply(ast.literal_eval)
This returns:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'list'
Does anybody see a solution for that? I am uncertain if it's a complicated case or I just lacke some further programming experience. If possible a short explanation would be helpful. That I can find a solution myself againg. Thank you in advance.
A pandas Series
can be trivially converted to a list, and a list of lists can be given as input to create a DataFrame
.
I think this could help:
splitted = pd.DataFrame(Kanten.stopsequence.str.split (','), index=Kanten.index).astype(int)
This gives you a new dataframe with same index as the original one but where each element is in its own column.
If relevant, you could then concat that new columns
pd.concat([Kanten, splitted], axis=1)
So from your second attempt at manipulating the data, your error message tells you that Kanten.stopsequence
is a Series
, not a DataFrame
. To convert, you'd need to access
list_of_lists = Kanten.stopsequence.to_numpy(dtype='int32').tolist()
Note that for your data this will create a nested 2d data array. To access the first integer from the first row, you would need to write list_of_lists[0][0]
.
This is how I would approach pulling the last column of a DataFrame into a list of ints.
Let's say the .csv
is located in the same directory as your .py
script and it's called kanten.csv
. The column you're looking for is stopsequence
.
import os
import pandas as pd
path=os.getcwd()
filename = 'kanten.csv'
filepath = os.path.join(path, filename)
kanten = pd.read_csv(filepath)
list = list(kanten['stopsequence'].apply(lambda x: int(x)))
In the last line, the stopsequence
column is pulled from kanten
, the values are casted as integers, then the column is converted to a standard python list object.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.