简体   繁体   English

将数据框列从字符串转换为数字列表

[英]convert a dataframe column from string to List of numbers

I have created the following dataframe from a csv file:我从 csv 文件创建了以下数据框:

id      marks
5155    1,2,3,,,,,,,,
2156    8,12,34,10,4,3,2,5,0,9
3557    9,,,,,,,,,,
7886    0,7,56,4,34,3,22,4,,,
3689    2,8,,,,,,,,

It is indexed on id .它在id上建立索引。 The values for the marks column are string. marks列的值是字符串。 I need to convert them to a list of numbers so that I can iterate over them and use them as index number for another dataframe.我需要将它们转换为数字列表,以便我可以迭代它们并将它们用作另一个数据帧的索引号。 How can I convert them from string to a list?如何将它们从字符串转换为列表? I tried to add a new column and convert them based on " Add a columns in DataFrame based on other column " but it failed:我尝试添加一个新列并根据“ 基于其他列在 DataFrame 中添加列”进行转换,但失败了:

df = df.assign(new_col_arr=lambda x: np.fromstring(x['marks'].values[0], sep=',').astype(int))

Here's a way to do:这是一种方法:

df = df.assign(new_col_arr=df['marks'].str.split(','))

# convert to int
df['new_col'] = df['new_col_arr'].apply(lambda x: list(map(int, [i for i in x if i != ''])))

I presume that you want to create NEW dataframe, since the number of items is differnet from number of rows.我假设您想创建新的数据框,因为项目数与行数不同。 I suggest the following:我建议如下:

#source data
df = pd.DataFrame({'id':[5155, 2156, 7886], 
                   'marks':['1,2,3,,,,,,,,','8,12,34,10,4,3,2,5,0,9', '0,7,56,4,34,3,22,4,,,']

# create dictionary from df:
dd = {row[0]:np.fromstring(row[1], dtype=int, sep=',') for _, row in df.iterrows()}

{5155: array([1, 2, 3]),
 2156: array([ 8, 12, 34, 10,  4,  3,  2,  5,  0,  9]),
 7886: array([ 0,  7, 56,  4, 34,  3, 22,  4])}

# here you pad the lists inside dictionary so that they have equal length
...

# convert dd to DataFrame:
df2 = pd.DataFrame(dd)

I found two similar alternatives:我找到了两个类似的选择:

1. 1.

df['marks'] = df['marks'].str.split(',').map(lambda num_str_list: [int(num_str) for num_str in num_str_list if num_str])

2. 2.

df['marks'] = df['marks'].map(lambda arr_str: [int(num_str) for num_str in arr_str.split(',') if num_str])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 PySpark 数据框列从列表转换为字符串 - Convert PySpark dataframe column from list to string 如何将字符串转换为 python 上 dataframe 列中的列表? - how to convert a string to a list in a dataframe column on python? 从熊猫数据框中的列中提取字符串中的数字 - Extract the numbers in a string from a column in pandas dataframe Pandas:将 dataframe 的列从列表转换为字符串,并且字符串只有列表的唯一值 - Pandas: Convert column of dataframe from list to string and the string to have only unique values of list 如何将可变数量的“夫妇”列表转换为两列数据框? - How to convert a list of variable numbers of “couples” to a two column dataframe? 将列中的字符串十进制数字转换为浮于Pandas DataFrame中 - Convert string decimal numbers in column to float in a Pandas DataFrame 将数据框列从字符串列表转换为元组 - convert dataframe column from list of strings to tuples 将 dataframe 中的列从字符串类型转换为元组 - Convert a column in a dataframe from type string to tuple 将 DataFrame 列类型从字符串转换为日期时间 - Convert DataFrame column type from string to datetime 将pandas dataframe中的列从String转换为Float - Convert a column in pandas dataframe from String to Float
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM