Python: In a DataFrame, how do I find the year that strings from one column appear in another column?

Question

I've got a dataframe and want to loop through all strings within column c2 and print that string and the year it appears in column c2 and then also print the first year when it appears in column c1, if it exists in c1. And then tally the difference between the years in another column. There are NaN values in c2.

Example df:

id   year     c1                c2
0    1999     luke skywalker    han solo
1    2000     leia organa       r2d2
2    2001     han solo          finn
3    2002     r2d2              NaN
4    2004     finn              c3po
5    2002     finn              NaN
6    2005     c3po              NaN

Example printed result:

c2            year in c2   year in c1     delta
han solo      1999         2001           2
r2d2          2000         2002           2
finn          2001         2004           3
c3po          2004         2005           1

I'm using Jupyter notebook with python and pandas. Thanks!

Answer 1

You can do it in steps like this:

df1 = df[df.c2.notnull()].copy()

s = df.groupby('c1')['year'].first()
df1['year in c1'] = df1.c2.map(s)

df1 = df1.rename(columns={'year':'year in c2'})

df1['delta'] = df1['year in c1'] - df1['year in c2']

print(df1[['c2','year in c2','year in c1', 'delta']])

Output:

         c2  year in c2  year in c1  delta
0  han solo        1999        2001      2
1      r2d2        2000        2002      2
2      finn        2001        2004      3
4      c3po        2004        2005      1

Answer 2

Here is one way.

df['year_c1'] = df['c2'].map(df.groupby('c1')['year'].agg('first'))\
                        .fillna(0).astype(int)

df = df.rename(columns={'year': 'year_c2'})
df['delta'] = df['year_c1'] - df['year_c2']

df = df.loc[df['c2'].notnull(), ['id', 'year_c2', 'year_c1', 'delta']]

#    id  year_c2  year_c1  delta
# 0   0     1999   2001.0      2
# 1   1     2000   2002.0      2
# 2   2     2001   2004.0      3
# 4   4     2004   2005.0      1

Explanation

Map c1 to year , aggregating by "first".
Use this map on c2 to calculate year_c1 .
Calculate delta as the difference between year_c2 and year_c1 .
Remove rows with null in c2 and order columns.

Python: In a DataFrame, how do I find the year that strings from one column appear in another column?

Question

2 answers

solution1
1 ACCPTED 2018-03-08 17:53:24

solution2
0 2018-03-08 18:01:55

Python: In a DataFrame, how do I find the year that strings from one column appear in another column?

Question

2 answers

solution1 1 ACCPTED 2018-03-08 17:53:24

solution2 0 2018-03-08 18:01:55

solution1
1 ACCPTED 2018-03-08 17:53:24

solution2
0 2018-03-08 18:01:55