简体   繁体   English

如何为列表列表中的特定项目分配标签

[英]How to assign a label to specific item in a list of lists

I'm working on computing minimum values from a csv row (of which I have turned into a normalised list of list) and am having trouble assigning the original csv title / header with the corresponding minimum value from that row (of which I omitted both to make the normalization easier). 我正在计算csv行中的最小值(已将其变成列表的规范化列表),并且在为原始csv标题/标头分配该行中的相应最小值时遇到了麻烦(我省略了两者使归一化更容易)。 Below is what I am working with: 以下是我正在处理的内容:

My normalised list of lists (each sublist is a row from my csv file) 我的标准化列表列表(每个子列表是我的csv文件中的一行)

[[0.1442722616425349, 0.011387368532690107, 1.0, 0.01016955650916749, 0.0, 0.007007584956949359], [0.13618895033835154, 0.009739033790403672, 1.0, 0.011358919624000634, 0.0, 0.007134183651352274], [0.14773629092116417, 0.015197531681779487, 1.0, 0.009581175298448931, 0.0], [0.1480962502699423, 0.01613878131072959, 1.0, 0.015035304680545728, 0.0, 0.007260689113737381], [0.1404716315950755, 0.012720171642799673, 1.0, 0.011429478548387115, 0.0, 0.005808759430147285], [0.14362441283729363, 0.008943844575022054, 1.0, 0.008400152860935555, 0.0, 0.0020931326050634305]]

I calculate the minimum of each sublist (row) using 我使用以下方法计算每个子列表(行)的最小值

min_list = [min(p) for p in norm_row_list]

and obviously the output will be 显然输出将是

[0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

But what I want is instead of printing out numbers to print which column that number came from (each column has a string header, eg the first 0.0 is from my second last column [5] called Generosity). 但是我想要的是代替打印数字来打印该数字来自哪一列(每列都有一个字符串标题,例如,第一个0.0来自我的倒数第二列[5],称为Generosity)。 As well as print out the title (which is located in the first column, with the first row in that column being Afghanistan). 以及打印标题(位于第一列,该列的第一行是阿富汗)。

country,      header2,     header3,     header4,     header5      header6
Australia     1.0          0.3435353    0.0          0.23124234   0.35334
Algeria       0.343434     0.434343     1.0          0.0          0.344343

So I need to the compute the min of each row and have it out below: 所以我需要计算每一行的最小值,并在下面列出:

Australia's happiness is bound by its header4
Algeria's happiness is bound by its header5

Let headers be column headers and list_of_lists all the other rows (including the country column). headers为列标头,并list_of_lists列出所有其他行(包括country列)。 Then the following do what you want: 然后执行以下操作:

headers = ['country', ...]

for mList in list_of_lists:
  cur_min = 1000
  min_index = 0
  for col_index, item in enumerate(mList[1:]):
    if item < cur_min:
      cur_min = item
      min_index = col_index
  print(mList[0] + "'s happiness is bound by its " + headers[min_index + 1])

If the minimum is always 0, then the above code can be simplified. 如果最小值始终为0,则可以简化上述代码。

Example of the above mentioned variables: 上述变量的示例:

list_of_lists = [['Australia',1.0,0.3435353,0.0,0.23124234,0.35334],['Algeria',0.343434,0.434343,1.0,0.0,0.344343]]
headers = ['country','a','b','c','d','e','f']

If you are OK with using pandas, this can be done pretty easily. 如果您可以使用熊猫,可以很容易地做到这一点。 First import you CSV into pandas as dataframe with 首先将CSV导入熊猫作为数据框

df=pd.read_csv('filename.csv') #need to look-up other postas which will help you read your CSV into pandas as a dataframe.

Then use the below code. 然后使用下面的代码。

for index, row in df.iterrows():
    print (row['country,'] + ' happiness is bound by its ' + df.columns[row.values == 0][0])

My input is the data frame as below 我的输入是数据框如下

    country,    header2,    header3,    header4,    header5
0   Australia   1.000000    0.343535    0.0     0.231242
1   Algeria     0.343434    0.434343    1.0     0.000000

Ouput 乌普特

Australia happiness is bound by its header4,
Algeria happiness is bound by its header5

I'm not sure if I understand the question fully but this would allow you to read in a CSV and output what you're looking for without having to put it into a list of lists. 我不确定我是否完全理解该问题,但这将允许您读取CSV并输出所需的内容,而不必将其放入列表中。 I avoided things like pandas because it's gibangous and maybe that's overkill. 我避免使用大熊猫之类的东西,因为它太笨拙了,也许这太过分了。 Those libraries are for sure the way to go for more complex work, though. 但是,这些库无疑是进行更复杂工作的方式。

for a csv similar to this structure 对于类似于此结构的csv

country,header1,header2,header3,header4,header5,header6
Algeria,1,2,55,3,2,3
Australia,33,2,8,3,99,0
UnitedStates,9,8,7,6,5,4

You could use this code 您可以使用此代码

import csv

with open('file.csv', newline="\n") as f:
    reader = csv.DictReader(f)
    for row in reader:
        # do whatever normalization to row values you need to do
        minval = min(([v for i,v in enumerate(row.values()) if i != 0]))
        i = [v for i,v in enumerate(row.values()) if i != 0].index(minval)
        h = [r for r in row.keys()][i+1]
        print(f"{row['country']}'s happiness is bound by its {h}")

If you have to work from a list of lists, you can put the headers into a variable, capture the index of the minimum value with the list.index function and reference the correct header from that index the same way I initialized i in the snippet I pasted. 如果您必须使用列表列表,则可以将标头放入变量中,使用list.index函数捕获最小值的索引,并以与片段中初始化i相同的方式从该索引中引用正确的标头我贴了

headers = ['header1', 'header2', ...]
countries = ['a', 'bunch', 'o', 'countries', ...]
for ci,row in enumerate(list_of_lists):
    minval = min(row)
    i = row.index(minval)
    h = headers[i]
    print(f"{countries[ci]}'s happiness is bound by its {h}")

I hope I caught what you were shooting for. 我希望我能抓住你的目标。 Good luck! 祝好运!

I think you mean 我想你是说

header = ['country', 'header2', 'header3', 'header4', 'header5', 'header6']

[header[p.index(min(p))] for p in norm_row_list]

# ['header5', 'header5', 'header5', 'header5', 'header5', 'header5']

Given that norm_row_list you provided above. 鉴于您上面提供的norm_row_list

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM