[英]Modifying column of 2d list while iterating over it in python
I am trying to write a function that turns all the non-numerical columns in a data set to numerical form. 我正在尝试编写一个函数,将数据集中的所有非数字列都转换为数字形式。
The data set is a list of lists. 数据集是列表的列表。
Here is my code: 这是我的代码:
def handle_non_numerical_data(data):
def convert_to_numbers(data, index):
items = []
column = [line[0] for line in data]
for item in column:
if item not in items:
items.append(item)
[line[0] = items.index(line[0]) for line in data]
return new_data
for value in data[0]:
if isinstance(value, str):
convert_to_numbers(data, data[0].index(value))
Apparently [line[0] = items.index(line[0]) for line in data]
is not valid syntax and I cant figure out how to modify the first column of data while iterating over it. 显然
[line[0] = items.index(line[0]) for line in data]
是无效的语法,我无法弄清楚在迭代数据时如何修改第一列数据。
I can't use numpy because the data will not be in numerical form until after this function is run. 我不能使用numpy,因为直到运行此函数后,数据才会采用数字形式。
How do I do this and why is it so complicated? 我该怎么做,为什么这么复杂? I feel like this should be way simpler than it is...
我觉得这应该比现在简单得多...
In other words, I want to turn this: 换句话说,我想把这个变成:
[[M,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
[M,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
[F,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
into this: 到这个:
[[0,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
[0,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
[1,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
Note that the first column was changed from strings to numbers. 请注意,第一列从字符串更改为数字。
data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
values = {'M': 0, 'F': 1}
new_data = [[values.get(val, val) for val in line] for line in data]
new_data
Output: 输出:
[[0, 0.455, 0.365, 0.095, 0.514, 0.2245, 0.101, 0.15, 15],
[0, 0.35, 0.265, 0.09, 0.2255, 0.0995, 0.0485, 0.07, 7],
[1, 0.53, 0.42, 0.135, 0.677, 0.2565, 0.1415, 0.21, 9]]
You can take advantage of Python dictionaries and their get
method. 您可以利用Python字典及其
get
方法。
These are values for the strings: 这些是字符串的值:
values = {'M': 0, 'F': 1}
You can also add more strings like I
with a corresponding value. 您还可以添加更多类似
I
字符串并带有相应的值。
If the string is values
, you will get the value from the dict: 如果字符串是
values
,则将从dict中获取值:
>>> values.get('M', 'M')
0
Otherwise, you will get the original value: 否则,您将获得原始值:
>>> values.get(10, 10)
10
Rather than indexing (which I'm not sure how it was supposed to work in your example), you can instead create a dictionary mapping for letters to numbers. 除了索引(我不确定在您的示例中应该如何工作)之外,您还可以创建一个字母到数字的字典映射。 Something like this should work.
这样的事情应该起作用。
raw_data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
def handle_non_numerical_data(data):
mapping = {'M': 0, 'F': 1, 'I': 2}
for item in raw_data:
if isinstance(item[0], str):
item[0] = mapping.get(item[0], -1) # Returns -1 if letter not found
return data
run = handle_non_numerical_data(raw_data)
print(run)
This answer will use a dict
to store the coding from str
to int
. 这个答案将使用
dict
来存储从str
到int
的编码。 It can be preloaded and also investigated after the data has been replaced. 可以在数据替换后对其进行预加载和调查。
# MODIFIES DATA IN PLACE
data = [['M',0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15],
['M',0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7],
['F',0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9]]
coding_dict = {} # can also preload this {'M': 0, 'F':1}
for row in data:
if row[0] not in coding_dict:
coding_dict[row[0]] = len(coding_dict)
row[0] = coding_dict[row[0]]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.