使用binarizer和for循環替換pandas列每一行中的單元格值

Question

我需要一些幫助。 我正在嘗試更改.csv文件中的一列，其中某些為空，有些具有類別列表。 如下：

tdaa_matParent,tdaa_matParentQty
[],[]
[],[]
[],[]
[BCA_Aluminum],[1.3458]
[BCA_Aluminum],[1.3458]
[BCA_Aluminum],[1.3458]
[BCA_Aluminum],[1.3458]
[],[]
[Dye Penetrant Solution, BCA_Aluminum],[0.002118882, 1.3458]

但是到目前為止，我僅設法對第一列（tdaa_matParent）進行了二值化處理，但是無法像這樣將1替換為其相應的數量值。

s = materials['tdaa_matParent']
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_)

BCA_Aluminum,Dye Penetrant Solution,tdaa_matParentQty
0,0,[]
0,0,[]
0,0,[]
1,0,[1.3458,0]
1,0,[1.3458,0]
1,0,[1.3458,0]
1,0,[1.3458,0]
0,0,[]
1,1,[1.3458,0.002118882]

但是我真正想要的是每個列類別的一組新列（即BCA_Aluminum和Dye Penetrant Solution）。 同樣，如果填充了每一列，則將其替換為第二列的（tdaa_matParentQty）值。

例如：

BCA_Aluminum,Dye Penetrant Solution
0,0
0,0
0,0
1.3458,0
1.3458,0
1.3458,0
1.3458,0
0,0
1.3458,0.002118882

Answer 1

謝謝！ 我建立了另一種方法也可以工作（雖然速度稍慢）。 任何建議，請隨時分享:)

df_matParent_with_Qty = pd.DataFrame()

# For each row in the dataframe (index and row´s column info),
for index, row in ass_materials.iterrows():

# For each row iteration save name of the element (matParent) and it´s index number:   
    for i, element in enumerate(row["tdaa_matParent"]):
#         print(i)
#         print(element)
# Fill in the empty dataframe with lists from each element
# And in each of their corresponding index (row), replace it with the value index inside the matParentqty list.
        df_matParent_with_Qty.loc[index,element] = row['tdaa_matParentQty'][i]

df_matParent_with_Qty.head(10)

Answer 2

這就是我使用內置Python手段處理問題中提供的示例數據的方式：

from collections import OrderedDict
import pandas as pd

# simple case - material names are known before we process the data - allows to solve the problem with a single for loop
# OrderedDict is used to preserve the order of material names during the processing
base_result = OrderedDict([
    ('BCA_Aluminum', .0),
    ('Dye Penetrant Solution', .0)])
result = list()

with open('1.txt', mode='r', encoding='UTF-8') as file:

    # skip header
    file.readline()

    for line in file:

        # copy base_result to reuse it during the looping
        base_result_copy = base_result.copy()

        # modify base result only if there are values in the current line
        if line != '[],[]\n':
            names, values = line.strip('[]\n').split('],[')
            for name, value in zip(names.split(', '), values.split(', ')):
                base_result_copy[name] = float(value)

        # append new line (base or modified) to the result
        result.append(base_result_copy.values())

# turn list of lists into pandas dataframe
result = pd.DataFrame(result, columns=base_result.keys())
print(result)

輸出：

   BCA_Aluminum  Dye Penetrant Solution
0        0.0000                0.000000
1        0.0000                0.000000
2        0.0000                0.000000
3        1.3458                0.000000
4        1.3458                0.000000
5        1.3458                0.000000
6        1.3458                0.000000
7        0.0000                0.000000
8        1.3458                0.002119

0.002119而不是0.002118882是因為默認情況下熊貓如何顯示浮點數，因此原始精度保留在數據0.002118882的實際數據中。

使用binarizer和for循環替換pandas列每一行中的單元格值

問題描述

2 個解決方案

解決方案1
1 2018-11-07 13:50:49

解決方案2
0 已采納 2018-11-06 17:07:14

使用binarizer和for循環替換pandas列每一行中的單元格值

問題描述

2 個解決方案

解決方案1 1 2018-11-07 13:50:49

解決方案2 0 已采納 2018-11-06 17:07:14

解決方案1
1 2018-11-07 13:50:49

解決方案2
0 已采納 2018-11-06 17:07:14