Issue: Getting the SettingWithCopy warning.
A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead
Goal: Separate the column data into separate columns, all in the same DataFrame.
Input: A Dataframe with 2 columns. First column is an email address and the second contains a list of dates separated by semicolons.
Code:
for dt in lunch_dates:
roulette_data[dt] = roulette_data['date'].str.contains(dt).map(bool_conversion)
What I want this code to do (and it does): Add a new column for each date found (dt) in the originating date column.
Question: How to use iloc in this case, to ensure I am not working on a possible copy of the dataframe in memory?
Without data to test it on, I cannot test it but the below should work (replace your 'email_column_name' with the name of the email column):
dates = pd.get_dummies(
roulette_data.set_index('email_column_name')['date']\
.str.split(';',expand=True)\
.stack().reset_index(level=1, drop=True)
)\
.reset_index().groupby('email_column_name').sum()
df = pd.DataFrame({'col1':['record1', 'record2'],
'col2':["this is good text", "but this is even better"]}
)
df
# col1 col2
#0 record1 this is good text
#1 record2 but this is even better
We first set the index to be col1
, then we select col2
, so we can use its .str.split
method to split the lines into individual words.
df.set_index('col1')['col2'].str.split(expand=True)
# 0 1 2 3 4
#col1
#record1 this is good text None
#record2 but this is even better
Then we use stack
to change the shape and reset_index
to get rid of the unnecessary index level
df.set_index('col1')['col2'].str.split(expand=True)\
.stack().reset_index(level=1, drop=True)
#col1
#record1 this
#record1 is
#record1 good
#record1 text
#record2 but
#record2 this
#record2 is
#record2 even
#record2 better
#dtype: object
We wrap this whole expression in pd.get_dummies()
pd.get_dummies(df.set_index('col1')['col2'].str.split(expand=True).stack().reset_index(level=1, drop=True))
# better but even good is text this
#col1
#record1 0 0 0 0 0 0 1
#record1 0 0 0 0 1 0 0
#record1 0 0 0 1 0 0 0
#record1 0 0 0 0 0 1 0
#record2 0 1 0 0 0 0 0
#record2 0 0 0 0 0 0 1
#record2 0 0 0 0 1 0 0
#record2 0 0 1 0 0 0 0
#record2 1 0 0 0 0 0 0
Finally we reset_index
(which is col1
or in your case the e-mail column), groupby
the col1
and sum over it.
pd.get_dummies(
df.set_index('col1')['col2']\
.str.split(expand=True)\
.stack().reset_index(level=1, drop=True)
)\
.reset_index().groupby('col1').sum()
# better but even good is text this
#col1
#record1 0 0 0 1 1 1 1
#record2 1 1 1 0 1 0 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.