[英]How do I extract the date from a column in a csv file using pandas?
This is the 'aired' column in the csv file: as这是 csv 文件中的“播出”列:作为
Link to the csv file: https://drive.google.com/file/d/1w7kIJ5O6XIStiimowC5TLsOCUEJxuy6x/view?usp=sharing链接到 csv 文件: https : //drive.google.com/file/d/1w7kIJ5O6XIStiimowC5TLsOCUEJxuy6x/view? usp =sharing
I want to extract the date and the month (in words) from the date following the 'from' word and store it in a separate column in another csv file.我想从“from”单词后面的日期中提取日期和月份(以单词为单位),并将其存储在另一个 csv 文件的单独列中。 The 'from' is an obstruction since had it been just the date it would have been easily extracted as a timestamp format.
'from' 是一个障碍,因为如果它只是作为时间戳格式很容易提取的日期。
import pandas as pd
file = pd.read_csv('file.csv')
result = []
for cell in file['aired']:
date = cell[8:22]
date_ts = pd.to_datetime(date, format='%Y-%m-%d')
result.append((date_ts.month_name(), date_ts))
df = pd.DataFrame(result, columns=['month', 'date'])
df.to_csv('result_file.csv')
You are starting from a string and want to break out the data within it.您从一个字符串开始,并希望分解其中的数据。 The single quotes is a clue that this is a dict structure in string form.
单引号表明这是一个字符串形式的 dict 结构。 The Python standard libraries include the
ast
(Abstract Syntax Trees) module whose literal_eval
method can read a string into a dict, gleaned from this SO answer: Convert a String representation of a Dictionary to a dictionary? Python 标准库包括
ast
(抽象语法树)模块,其literal_eval
方法可以将字符串读入 dict,从这个 SO 答案中收集: Convert a String representation of a Dictionary to a dictionary?
You want to apply
that to your column to get the dict, at which point you expand it into separate columns using .apply(pd.Series)
, based on this SO answer: Splitting dictionary/list inside a Pandas Column into Separate Columns您想
apply
其应用于您的列以获取 dict,此时您使用.apply(pd.Series)
将其扩展为单独的列,基于此 SO 答案: 将 Pandas 列中的字典/列表拆分为单独的列
Try the following尝试以下
import pandas as pd
import ast
df = pd.read_csv('AnimeList.csv')
# turn the pd.Series of strings into a pd.Series of dicts
aired_dict = df['aired'].apply(ast.literal_eval)
# turn the pd.Series of dicts into a pd.Series of pd.Series objects
aired_df = aired_dict.apply(pd.Series)
# pandas automatically translates that into a pd.DataFrame
# concatenate the remainder of the dataframe with the new data
df_aired = pd.concat([df.drop(['aired'], axis=1), aired_df], axis=1)
# convert the date strings to datetime values
df_aired['aired_from'] = pd.to_datetime(df_aired['from'])
df_aired['aired_to'] = pd.to_datetime(df_aired['to'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.