简体   繁体   English

如何通过列值的范围将单个数据帧拆分为多个数据帧?

[英]How do I split a single dataframe into multiple dataframes by the range of a column value?

First off, I realize that this question has been asked a ton of times in many different forms, but a lot of the answers just give code that solves the problem without explaining what the code actually does or why it works. 首先,我意识到这个问题已经以许多不同的形式被问过很多次,但是很多答案只是给出了解决问题的代码,而没有解释代码的实际作用或工作原理。

I have an enormous data set of phone numbers and area codes that I have loaded into a dataframe in python to do some processing with. 我有大量的电话号码和区号数据集,已将它们加载到python的数据框中以进行一些处理。 Before I do that processing, I need to split the single dataframe into multiple dataframes that contain phone numbers in certain ranges of area codes that I can then do more processing on. 在执行该处理之前,我需要将单个数据框拆分为多个包含某些区域代码电话号码的数据框,然后可以对其进行更多处理。 For example: 例如:

+---+--------------+-----------+
|   | phone_number | area_code |
+---+--------------+-----------+
| 1 | 5501231234   | 550       |
+---+--------------+-----------+
| 2 | 5051231234   | 505       |
+---+--------------+-----------+
| 3 | 5001231234   | 500       |
+---+--------------+-----------+
| 4 | 6201231234   | 620       |
+---+--------------+-----------+

into

area-codes (500-550)
+---+--------------+-----------+
|   | phone_number | area_code |
+---+--------------+-----------+
| 1 | 5501231234   | 550       |
+---+--------------+-----------+
| 2 | 5051231234   | 505       |
+---+--------------+-----------+
| 3 | 5001231234   | 500       |
+---+--------------+-----------+

and

area-codes (600-650)
+---+--------------+-----------+
|   | phone_number | area_code |
+---+--------------+-----------+
| 1 | 6201231234   | 620       |
+---+--------------+-----------+

I get that this should be possible using pandas (specifically groupby and a Series object I think) but the documentation and examples on the internet I could find were a little too nebulous or sparse for me to follow. 我知道使用熊猫(特别是groupby和一个Series对象)应该可以实现,但是我可以在互联网上找到的文档和示例太模糊或稀疏,以至于我无法遵循。 Maybe there's a better way to do this than the way I'm trying to do it? 也许有比我尝试的方法更好的方法吗?

You can use pd.cut to bin the area column , then use the labels to group the data and store in a dictionary. 您可以使用pd.cutbinarea栏,然后在字典中使用标签组的数据和存储。 Finally print each key to see the dataframe: 最后打印每个键以查看数据框:

bins=[500,550,600,650]
labels=['500-550','550-600','600-650']

d={f'area_code_{i}':g for i,g in 
  df.groupby(pd.cut(df.area_code,bins,include_lowest=True,labels=labels))}

print(d['area_code_500-550'])
print('\n')
print(d['area_code_600-650'])

    phone_number  area_code
0    5501231234        550
1    5051231234        505
2    5001231234        500


   phone_number  area_code
3    6201231234        620

You can also do this by select rows in dataframe by chaining multiple condition with & or | 您也可以通过使用&|链接多个条件来选择数据框中的行来执行此操作 operator 操作者

  • df1 select rows with area_code between 500-550 df1选择区域代码在500-550之间的行

  • df2 select rows with area_code between 600-650 df2选择区域代码在600-650之间的行


df = pd.DataFrame({'phone_number':[5501231234, 5051231234, 5001231234 ,6201231234],
                   'area_code':[550,505,500,620]}, 
                    columns=['phone_number', 'area_code'])
df1 = df[ (df['area_code']>=500) & (df['area_code']<=550) ]
df2 = df[ (df['area_code']>=600) & (df['area_code']<=650) ]

df1
phone_number  area_code
0    5501231234        550
1    5051231234        505
2    5001231234        500

df2
phone_number  area_code
3    6201231234        620

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 找到特定的列值后如何将一个数据帧拆分为多个数据帧 - How to split a dataframe in multiple dataframes after a specific column value is found 如何按列值将 Pandas 数据帧拆分/切片为多个数据帧? - How to split/slice a Pandas dataframe into multiple dataframes by column value? 如何检查 1 个数据帧中的列中的整数值是否存在于第 2 个数据帧中 2 列之间的范围拆分中? - How do I check for an integer value in a column in 1 dataframe to exist in a range split between 2 columns in 2nd dataframe? 根据列的值将 Pandas dataframe 拆分为多个数据帧 - Split a Pandas dataframe into multiple dataframes based on the value of a column 如何根据列名将数据框拆分为多个数据框 - How to split a dataframe to multiple dataframes bases on column names 如何根据column1的值将1个数据帧拆分为两个数据帧 - How to split 1 dataframe into two dataframes based on column1's value 如何将 Dataframe 列中值的最后 3 位拆分为两个新的数据帧? - How do I split last 3 digits in values in a column in a Dataframe into two new Dataframes? 使用pandas如何将数据帧拆分为多个大小为N的数据帧 - Using pandas how do I split a dataframe into multiple dataframes of size N by rows 在特定值之后将数据框拆分为多个数据框 - Split Dataframe into multiple dataframes after a specific value 如何将 DataFrame 拆分为多个行数更少的数据帧? - How can I split a DataFrame into multiple DataFrames of fewer rows?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM