简体   繁体   English

按前 30 列值拆分 python 中的 csv 文件

[英]split csv file in python by top 30 column values

suppose i have huge data file contain type column假设我有包含类型列的巨大数据文件

Date            Day Of Week Type
4/9/2015 0:00   Thursday    BATTERY
3/9/2015 0:00   Monday      THEFT
4/3/2015 0:00   Friday      DECEPTIVE PRACTICE
1/1/2015 0:01   Thursday    DECEPTIVE PRACTICE
4/10/2015 0:01  Friday      OTHER OFFENSE
3/27/2015 0:01  Friday      DECEPTIVE PRACTICE
4/10/2015 0:35  Friday      BATTERY

enter image description here在此处输入图片说明

can i generate csv file with top 30 types in this file?我可以在这个文件中生成前 30 种类型的 csv 文件吗?

I would use the command .groupby() together with the .count() and the .sort_values() command on the newly-created count column:我将与.Count之间的()和新创建的列中的.sort_values()命令一起使用的命令.groupby():

df_grouped = df[['Type','Date']].groupby(['Type'])['Date'] \
                                .count() \
                                .reset_index(name='count') \
                                .sort_values(['count'], ascending=False)

df_top_types = df_grouped.iloc[0:2][:]

For the sake of this example, I only retrieve the first two rows of the new DataFrame, in your case, select the first 30 rows.为了这个例子,我只检索新 DataFrame 的前两行,在你的情况下,选择前 30 行。

You can then save the new DataFrame as a .csv using the following command:然后,您可以使用以下命令将新 DataFrame 保存为.csv

df_top_types.to_csv('CSV_Top_Type.csv', index=False, header=True)

The csv file will be saved in your current working environment under the name 'CSV_Top_Type.csv'. csv 文件将以“CSV_Top_Type.csv”名称保存在您当前的工作环境中。

Alternatively, depending on the format you want your data stored, you could use the following code:或者,根据您希望存储数据的格式,您可以使用以下代码:

df = df[['Type','Date']].groupby(['Type'])['Date'].count().nlargest(2)
df.to_csv('CSV_Top_Type.csv', index=False, header=True)

Select 30 as the parameter given to the command .nlargest() .选择 30 作为给命令.nlargest()的参数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM