I have a dataset like this:
import pandas as pd
df = pd.DataFrame([[0, 0], [2,2] ], columns=('feature1', 'feature2'))
Now I would like to add an extra column
df['c'] = ""
And then loop trought the data.frame to fill up column C with the contents of both feature 1 and feature 2
for index, row in df.iterrows():
subject = row["feature1"]
content = row["feature2"]
row["C"] = subject, content
However if I print the data frame now. Something seems to go wrong cause column C is empty.
If you want to build a tuple out of two columns, be explicit and keep it simple:
df['c'] = df.apply(tuple, axis=1)
df
Out[7]:
feature1 feature2 c
0 0 0 (0, 0)
1 2 2 (2, 2)
EdChum has you covered in the comments for how to fix your approach - you should be using .loc
for indexing. However can achieve the same much more simply and without having to resort to row iteration by using zip
.
In[43]: df['c'] = list(zip(df.feature1, df.feature2))
in[44]: df
Out[44]:
feature1 feature2 c
0 0 0 (0, 0)
1 2 2 (2, 2)
df.assign(c=df.set_index(['feature1', 'feature2']).index.to_series().values)
You never updated the original column. You just updated a variable named row. But for ease of remembering code (not the most efficient obviously):
df['C'] = zip(df.feature1, df.feature2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.