简体   繁体   中英

How do I add a new column to an existing dataframe and fill it with partial data from another column?

I have a dataframe jobs screenshot of dataframe

I need to add a new column 'year' to jobs data frame. This column should contain the corresponding year for each post_date (which is already a column). For example: for post_date value 2017-08-16 'year' value should be 2017.

I am unsure how to insert a new column while also pulling data from a pre-existing column.

Use dt.year :

jobs['year'] = pd.to_datetime(jobs['post_date'], errors='coerce').dt.year

I would begin by transforming the column post_date into date format. After doing this, you could use a simple function to extract the year.

jobs["post_date"] =pd.to_datetime(jobs["post_date"])

should be enough to change it into a datetime type. If it doesnt you should use datetime strpstring in order to tell python what is the specific format of the "post_date" column, so it to read it as a date. After that do the following:

jobs["year"] =jobs["post_date"].dt.year

If I understand your question correctly, you want to add a new column of values of years to the existing dataframe from a column in your current dataframe. For extracting only the year values, you need to do some calculations first. You can make use of pandas datetime.datetime and extract only the values of the year in your Post_date column. Have a look at this or this . For storing these year values, you can simply do this:

jobs['year'] = jobs['post_date'].dt.year

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM