I have a column, in a pandas dataframe, in which sometimes there is a repeating string:
col1 | col2 |
---|---|
1 | hello |
2 | bye |
3 | hello |
4 | morning |
5 | night |
6 | hello |
Would I would like to do is to modify all but the first occurence of "hello" in "hello again". So the first occurence of hello remains the same.
col1 | col2 |
---|---|
1 | hello |
2 | bye |
3 | hello again |
4 | morning |
5 | night |
6 | hello again |
You can use
df['col2'] = df['col2'].str.split(expand=True)[0]
split()
by default split at space, and the expand=true creates two variables instead of two lists
You can find the indices of the rows containing "hello"
and then modify all but the first occurrence using pandas.DataFrame.
: :
In [1]: import pandas as pd
In [2]: df = pd.DataFrame(data={'col1': [1, 2, 3, 4, 5, 6],
...: 'col2': ['hello', 'bye', 'hello', 'morning', 'night', 'hello']})
In [3]: df
Out[3]:
col1 col2
0 1 hello
1 2 bye
2 3 hello
3 4 morning
4 5 night
5 6 hello
In [4]: hello_indices = df.index[df['col2'] == 'hello']
In [5]: hello_indices
Out[5]: Int64Index([0, 2, 5], dtype='int64')
In [6]: df.loc[hello_indices[1:],'col2'] = 'hello again'
In [7]: df
Out[7]:
col1 col2
0 1 hello
1 2 bye
2 3 hello again
3 4 morning
4 5 night
5 6 hello again
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.