按前 30 列值拆分 python 中的 csv 文件

Question

suppose i have huge data file contain type column假设我有包含类型列的巨大数据文件

Date            Day Of Week Type
4/9/2015 0:00   Thursday    BATTERY
3/9/2015 0:00   Monday      THEFT
4/3/2015 0:00   Friday      DECEPTIVE PRACTICE
1/1/2015 0:01   Thursday    DECEPTIVE PRACTICE
4/10/2015 0:01  Friday      OTHER OFFENSE
3/27/2015 0:01  Friday      DECEPTIVE PRACTICE
4/10/2015 0:35  Friday      BATTERY

enter image description here在此处输入图片说明

can i generate csv file with top 30 types in this file?我可以在这个文件中生成前 30 种类型的 csv 文件吗？

Answer 1

I would use the command .groupby() together with the .count() and the .sort_values() command on the newly-created count column:我将与.Count之间的（）和新创建的数列中的.sort_values（）命令一起使用的命令.groupby（）：

df_grouped = df[['Type','Date']].groupby(['Type'])['Date'] \
                                .count() \
                                .reset_index(name='count') \
                                .sort_values(['count'], ascending=False)

df_top_types = df_grouped.iloc[0:2][:]

For the sake of this example, I only retrieve the first two rows of the new DataFrame, in your case, select the first 30 rows.为了这个例子，我只检索新 DataFrame 的前两行，在你的情况下，选择前 30 行。

You can then save the new DataFrame as a .csv using the following command:然后，您可以使用以下命令将新 DataFrame 保存为.csv ：

df_top_types.to_csv('CSV_Top_Type.csv', index=False, header=True)

The csv file will be saved in your current working environment under the name 'CSV_Top_Type.csv'. csv 文件将以“CSV_Top_Type.csv”名称保存在您当前的工作环境中。

Alternatively, depending on the format you want your data stored, you could use the following code:或者，根据您希望存储数据的格式，您可以使用以下代码：

df = df[['Type','Date']].groupby(['Type'])['Date'].count().nlargest(2)
df.to_csv('CSV_Top_Type.csv', index=False, header=True)

Select 30 as the parameter given to the command .nlargest() .选择 30 作为给命令.nlargest()的参数。

按前 30 列值拆分 python 中的 csv 文件

问题描述

1 个解决方案

解决方案1
0 2020-11-24 11:15:56

按前 30 列值拆分 python 中的 csv 文件

问题描述

1 个解决方案

解决方案1 0 2020-11-24 11:15:56

解决方案1
0 2020-11-24 11:15:56