簡體   English   中英

遍歷列的每一行並執行操作

[英]Iterate through each rows of a column and perform operation

我有我的數據——

data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns = ['names', 'category'])
df
names   category
abc - a   A
def - b   B
ghi - c   C
jkl - d   D

我想要的輸出是 -

names     division    category
    abc      a          A
    def      b          B
    ghi      c          C
    jkl      d          D

有很多方法可以執行此操作,但我想使用此邏輯執行此操作 -

遍歷列名的每一行,並將每個值存儲在 'st1' 中,然后 ->

first, middle, last = st1.partition(' - ')
df['names'] = first
df['division'] = last

並將其一一分配給數據框,請幫助我在python中獲得所需的輸出。

你可以這樣做:

df[['names','division']] = df.names.str.split(' - ',expand=True)

像之前一樣創建數據幀,然后遍歷名稱和類別的所有行,並通過- s 拆分名稱並將它們附加到新數據集,然后將其轉換為另一個數據幀,如下所示:

import pandas as pd

data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns = ['names', 'category'])

newdata = []
for names, category in zip(df.names, df.category):
    name, division = names.split("-")
    newdata.append([name.strip(), division.strip(), category])

new_df = pd.DataFrame(newdata, columns = ['names', 'division', 'category'])

print新的數據幀結果:

>>> new_df
  names division category
0   abc        a        A
1   def        b        B
2   ghi        c        C
3   jkl        d        D

我正在測試 github copilot,看看它如何解決 stackoverflow 問題。

# Solution 1
import pandas as pd
import numpy as np

data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns=['names', 'category'])


# Iterate through each rows of column-names, and store each value in 'st1' and then ->
# first, middle, last = st1.partition(' - ')
# df['names'] = first
# df['division'] = last
# and also assigning it to dataframe one by one, please help me to get my desired output in python.


for index, row in df.iterrows():
    st1 = row['names']
    first, middle, last = st1.partition(' - ')
    df.loc[index, 'names'] = first
    df.loc[index, 'division'] = last

# Explain what is df.loc
# df.loc[row index, column index]
# df.loc[0, 'names'] = first
# df.loc[0, 'division'] = last

print(df)

輸出:

  names category division
0   abc        A        a
1   def        B        b
2   ghi        C        c
3   jkl        D        d

由於您想遍歷DataFrame每一行並單獨使用它們,因此您需要使用一些 NumPy 來完成您的工作。 由於您要拆分行,因此.partition()工作方式與 Pandas 中的.split()類似,但在 NumPy 中則不同。

以下是您需要的軟件包:

import pandas as pd
import numpy as np

在遍歷行之前,您需要使用.insert()創建一個名為“division”的新列(我使用np.nan作為填充np.nan ,但您可以使用任何您想要的值:

df.insert(1, 'division', np.nan)

現在您可以使用 Pandas 的iterrows()方法遍歷行。

# index returns the index number, row returns a tuple of the row values
for index, row in df.iterrows():
    
    # convert row values from a tuple to a row
    row = list(row)
    
    # remove 'np.nan' value from the column we created above
    row.pop(1)
    
    # split value from the 'names' column; creates values for 'names' and 'division' columns
    new_row = row[0].split(' - ')
    
    # append the value from the 'category' column
    new_row = np.append(new_row, row[1])
    
    # save the new row to the DataFrame
    df.iloc[index] = new_row

這是輸出:

|    | names   | division   | category   |
|---:|:--------|:-----------|:-----------|
|  0 | abc     | a          | A          |
|  1 | def     | b          | B          |
|  2 | ghi     | c          | C          |
|  3 | jkl     | d          | D          |

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM