[英]iterate through each row and column in dataframe and perform action on column value
[英]Iterate through each rows of a column and perform operation
我有我的數據——
data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns = ['names', 'category'])
df
names category
abc - a A
def - b B
ghi - c C
jkl - d D
我想要的輸出是 -
names division category
abc a A
def b B
ghi c C
jkl d D
有很多方法可以執行此操作,但我想使用此邏輯執行此操作 -
遍歷列名的每一行,並將每個值存儲在 'st1' 中,然后 ->
first, middle, last = st1.partition(' - ')
df['names'] = first
df['division'] = last
並將其一一分配給數據框,請幫助我在python中獲得所需的輸出。
你可以這樣做:
df[['names','division']] = df.names.str.split(' - ',expand=True)
像之前一樣創建數據幀,然后遍歷名稱和類別的所有行,並通過-
s 拆分名稱並將它們附加到新數據集,然后將其轉換為另一個數據幀,如下所示:
import pandas as pd
data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns = ['names', 'category'])
newdata = []
for names, category in zip(df.names, df.category):
name, division = names.split("-")
newdata.append([name.strip(), division.strip(), category])
new_df = pd.DataFrame(newdata, columns = ['names', 'division', 'category'])
print
新的數據幀結果:
>>> new_df
names division category
0 abc a A
1 def b B
2 ghi c C
3 jkl d D
我正在測試 github copilot,看看它如何解決 stackoverflow 問題。
# Solution 1
import pandas as pd
import numpy as np
data = [['abc - a', 'A'], ['def - b', 'B'], ['ghi - c', 'C'], ['jkl - d', 'D']]
df = pd.DataFrame(data, columns=['names', 'category'])
# Iterate through each rows of column-names, and store each value in 'st1' and then ->
# first, middle, last = st1.partition(' - ')
# df['names'] = first
# df['division'] = last
# and also assigning it to dataframe one by one, please help me to get my desired output in python.
for index, row in df.iterrows():
st1 = row['names']
first, middle, last = st1.partition(' - ')
df.loc[index, 'names'] = first
df.loc[index, 'division'] = last
# Explain what is df.loc
# df.loc[row index, column index]
# df.loc[0, 'names'] = first
# df.loc[0, 'division'] = last
print(df)
輸出:
names category division
0 abc A a
1 def B b
2 ghi C c
3 jkl D d
由於您想遍歷DataFrame
每一行並單獨使用它們,因此您需要使用一些 NumPy 來完成您的工作。 由於您要拆分行,因此.partition()
工作方式與 Pandas 中的.split()
類似,但在 NumPy 中則不同。
以下是您需要的軟件包:
import pandas as pd
import numpy as np
在遍歷行之前,您需要使用.insert()
創建一個名為“division”的新列(我使用np.nan
作為填充np.nan
,但您可以使用任何您想要的值:
df.insert(1, 'division', np.nan)
現在您可以使用 Pandas 的iterrows()
方法遍歷行。
# index returns the index number, row returns a tuple of the row values
for index, row in df.iterrows():
# convert row values from a tuple to a row
row = list(row)
# remove 'np.nan' value from the column we created above
row.pop(1)
# split value from the 'names' column; creates values for 'names' and 'division' columns
new_row = row[0].split(' - ')
# append the value from the 'category' column
new_row = np.append(new_row, row[1])
# save the new row to the DataFrame
df.iloc[index] = new_row
這是輸出:
| | names | division | category |
|---:|:--------|:-----------|:-----------|
| 0 | abc | a | A |
| 1 | def | b | B |
| 2 | ghi | c | C |
| 3 | jkl | d | D |
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.