简体   繁体   English

Python 初学者 - 使用 dtale 编码(命令顺序问题)

[英]Python beginner - coding with dtale (command order problem)

Due to a university project where I want to work and learn python, I stumbled upon the add-on dtale, which helps me analyzing party manifesto mass data.由于我想工作和学习的大学项目 python,我偶然发现了附加组件 dtale,它可以帮助我分析党宣言海量数据。

Long story short: I added some filters (eg I only want to show rows with an edate >= 20140914. When I run the code, the filters don't seem to be applied - could you please help me with that?长话短说:我添加了一些过滤器(例如,我只想显示 edate >= 20140914 的行。当我运行代码时,似乎没有应用过滤器 - 你能帮我解决这个问题吗?

import pandas as pd
df = pd.read_csv('https://manifestoproject.wzb.eu/down/data/2020b/datasets/MPDataset_MPDS2020b.csv')
d = dtale.show(df)

# DISCLAIMER: 'df' refers to the data you passed in when calling 'dtale.show'

import pandas as pd

if isinstance(df, (pd.DatetimeIndex, pd.MultiIndex)):
    df = df.to_frame(index=False)

# remove any pre-existing indices for ease of use in the D-Tale code, but this is not required
df = df.reset_index().drop('index', axis=1, errors='ignore')
df.columns = [str(c) for c in df.columns]  # update columns to strings in case they are numbers

df.loc[:, 'edate'] = pd.Series(pd.to_datetime(df['edate'], infer_datetime_format=True), name='edate', index=df['edate'].index)
d.open_browser()

So basically, my goal is to not always have to start filtering for dates etc, but that all my progress is saved and applied when running the code.所以基本上,我的目标是不必总是开始过滤日期等,而是在运行代码时保存并应用我的所有进度。

Thanks for your help!谢谢你的帮助!

There are some other arguments you can pass to pd.read_csv() that will probably help you out here:您可以将其他一些 arguments 传递给pd.read_csv() ,这可能会对您有所帮助:

  • parse_dates : Give this arguments as a list of columns that pandas should convert to dates. parse_dates :将此 arguments 作为列的列表,pandas 应将其转换为日期。 This might replace your second to last line.这可能会替换您的倒数第二行。
  • index_col : This allows you to explicitly set an index, which should help you with not having to convert .to_frame() index_col :这允许您显式设置索引,这应该可以帮助您不必转换.to_frame()

If these don't get you all the way there, I have two ideas:如果这些不能让你一直到那里,我有两个想法:

  1. You can put all this logic inside of it's own function called something like clean_df and call that on newly loaded data.您可以将所有这些逻辑放在它自己的 function 中,称为clean_df之类的东西,并在新加载的数据上调用它。
  2. You can save your cleaned data in a format other than a .csv .您可以将清理后的数据保存为.csv以外的格式。 One (of many) option is that DataFrames can be saved to something called a pickle , which is one way python objects can be saved to memory .一个(许多)选项是DataFrames 可以保存到称为pickle的东西,这是python 对象可以保存到 memory的一种方式。 Loading DataFrames from a pickle brings them back pretty much exactly how you saved them, no need to do all the cleaning.从 pickle 中加载DataFrames可以让它们恢复到与保存它们完全相同的状态,无需进行所有清理工作。

Also small note, I don't think you need to import pandas twice.另外请注意,我认为您不需要两次import pandas

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM