简体   繁体   English

Python:在DataFrame中,如何查找某一列中的字符串出现在另一列中的年份?

[英]Python: In a DataFrame, how do I find the year that strings from one column appear in another column?

I've got a dataframe and want to loop through all strings within column c2 and print that string and the year it appears in column c2 and then also print the first year when it appears in column c1, if it exists in c1. 我有一个数据框,想要遍历c2列中的所有字符串,并打印该字符串及其出现在c2列中的年份,然后还打印出出现在c1列中的第一年 (如果它存在于c1中)。 And then tally the difference between the years in another column. 然后在另一列中计算年份之间的差异。 There are NaN values in c2. c2中有NaN值。

Example df: df示例:

id   year     c1                c2
0    1999     luke skywalker    han solo
1    2000     leia organa       r2d2
2    2001     han solo          finn
3    2002     r2d2              NaN
4    2004     finn              c3po
5    2002     finn              NaN
6    2005     c3po              NaN

Example printed result: 示例打印结果:

c2            year in c2   year in c1     delta
han solo      1999         2001           2
r2d2          2000         2002           2
finn          2001         2004           3
c3po          2004         2005           1

I'm using Jupyter notebook with python and pandas. 我正在将Jupyter Notebook与python和pandas一起使用。 Thanks! 谢谢!

You can do it in steps like this: 您可以按照以下步骤进行操作:

df1 = df[df.c2.notnull()].copy()

s = df.groupby('c1')['year'].first()
df1['year in c1'] = df1.c2.map(s)

df1 = df1.rename(columns={'year':'year in c2'})

df1['delta'] = df1['year in c1'] - df1['year in c2']

print(df1[['c2','year in c2','year in c1', 'delta']])

Output: 输出:

         c2  year in c2  year in c1  delta
0  han solo        1999        2001      2
1      r2d2        2000        2002      2
2      finn        2001        2004      3
4      c3po        2004        2005      1

Here is one way. 这是一种方法。

df['year_c1'] = df['c2'].map(df.groupby('c1')['year'].agg('first'))\
                        .fillna(0).astype(int)

df = df.rename(columns={'year': 'year_c2'})
df['delta'] = df['year_c1'] - df['year_c2']

df = df.loc[df['c2'].notnull(), ['id', 'year_c2', 'year_c1', 'delta']]

#    id  year_c2  year_c1  delta
# 0   0     1999   2001.0      2
# 1   1     2000   2002.0      2
# 2   2     2001   2004.0      3
# 4   4     2004   2005.0      1

Explanation 说明

  • Map c1 to year , aggregating by "first". c1映射到year ,按“ first”进行聚合。
  • Use this map on c2 to calculate year_c1 . 使用此映射在c2上计算year_c1
  • Calculate delta as the difference between year_c2 and year_c1 . 计算deltayear_c2year_c1之间的差。
  • Remove rows with null in c2 and order columns. 删除c2和order列中具有null行。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:在DataFrame中,如何遍历一列的所有字符串并检查它们是否出现在另一列中并计数? - Python: In a DataFrame, how do I loop through all strings of one column and check to see if they appear in another column and count them? 如何从一列中查找也出现在 Python 中 DataFrame 的另一列中的元素 - How to find the elements from one column which also appear in another column of a DataFrame in Python 在一个 dataframe 的一列中从另一个 dataframe 的另一列中查找字符串 - Find strings in a column of one dataframe from another column in a different dataframe 在 Python 中,如何根据另一列更改 dataframe 的一列? - In Python, how do I change one column of a dataframe based on another? 如何将列值从一个 dataframe 提取到另一个? - How do I extract column values from one dataframe to another? 如何将数据从一列移动到另一列 Python DataFrame - How do I move data from one column to another in Python DataFrame 如何通过在 Python DataFrame 中保持某些列值不变来将数据从一行合并到另一行 - How do I merge data from one row to another by keeping some column values unchanged in Python DataFrame 如何在具有另一列最大值的行中的一个 dataframe 列中找到值? - How do I find the value in one dataframe column in the row with the maximum value of another column? 计算数据框中的日期从日期时间列到 Python 中的另一列 - Calculate day of year in dataframe from a datetime column to another column in Python 如何将一个数据框中的列表列与另一数据框中的字符串列连接在一起? - How to join a column of lists in one dataframe with a column of strings in another dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM