简体   繁体   English

编辑python pandas过滤器中的数据并将其应用于原始数据框

[英]Edit data in a python pandas filter and apply it to the original data frame

I am trying to figure out how to filter data in pandas then assign a value to all of the rows in a column for the items that meet the filter criteria and have it affect the original data frame. 我试图弄清楚如何过滤pandas中的数据,然后为符合筛选条件的项目的列中的所有行分配值,并使其影响原始数据框。 Here is the closest attempt I have so far but it is throwing a lot of informational warnings: 这是我迄今为止最接近的尝试,但它抛出了许多信息警告:

    import pandas as pd
    df = pd.read_csv('http://www.sharecsv.com/dl/9096d32f98aa0ac671a1cca16fa43be8/SalesJan2009.csv')
    df['Zone'] = ''
    zone1 = df[(df['Latitude'] > 0) & (df['Latitude'] > 0)]
    zone2 = df[(df['Latitude'] < 0) & (df['Latitude'] > 0)]
    zone3 = df[(df['Latitude'] > 0) & (df['Latitude'] < 0)]
    zone4 = df[(df['Latitude'] < 0) & (df['Latitude'] < 0)]
    zone1[['Zone']] = zone1[['Zone']] = 1
    zone2[['Zone']] = zone1[['Zone']] = 2
    zone3[['Zone']] = zone1[['Zone']] = 3
    zone4[['Zone']] = zone1[['Zone']] = 4
    df

This does not affect the original data frame at all but it is setting the values in the filtered subsets. 这根本不会影响原始数据帧,但它会设置过滤子集中的值。

I am assuming that I may need to filter out everything that meets each of my filters and remove it from the original and then concatenate the changes back onto the original?? 我假设我可能需要过滤掉满足我的每个过滤器的所有内容并将其从原始过滤器中删除,然后将更改连接回原始版本?

This is a random dataset to illustrate what I am looking to do but my actual dataset has data that doesn't meet any filter criteria and I need to maintain those as unknown as well because I am not consuming all rows as I would be with this example. 这是一个随机数据集,用于说明我要做的事情,但我的实际数据集中的数据不符合任何过滤条件,我需要将这些数据保持为未知数,因为我不会消耗所有行,因为我会使用这个例。

I am trying to avoid having to loop over every row and check criteria against every row so if anyone knows how I can accomplish this I would be super grateful! 我试图避免不得不遍历每一行并检查每一行的标准,所以如果有人知道如何实现这一点,我将非常感激!

IIUC, are you trying to do something like this: IIUC,你想做这样的事情:

zone1 = (df['Latitude'] > 0) & (df['Longitude'] > 0)
zone2 = (df['Latitude'] < 0) & (df['Longitude'] > 0)
zone3 = (df['Latitude'] > 0) & (df['Longitude'] < 0)
zone4 = (df['Latitude'] < 0) & (df['Longitude'] < 0)

df['Zone'] = np.select([zone1,zone2,zone3,zone3],['Zone 1','Zone 2', 'Zone 3','Zone 4'])

Output: 输出:

  Transaction_date   Product Price Payment_Type               Name  \
0      1/2/09 6:17  Product1  1200   Mastercard           carolina   
1      1/2/09 4:53  Product1  1200         Visa             Betina   
2     1/2/09 13:08  Product1  1200   Mastercard  Federica e Andrea   
3     1/3/09 14:44  Product1  1200         Visa              Gouya   
4     1/4/09 12:56  Product2  3600         Visa            Gerd W    

                           City     State         Country Account_Created  \
0                      Basildon   England  United Kingdom     1/2/09 6:00   
1  Parkville                           MO   United States     1/2/09 4:42   
2  Astoria                             OR   United States    1/1/09 16:21   
3                        Echuca  Victoria       Australia   9/25/05 21:13   
4  Cahaba Heights                      AL   United States  11/15/08 15:47   

     Last_Login   Latitude   Longitude    Zone  
0   1/2/09 6:08  51.500000   -1.116667  Zone 3  
1   1/2/09 7:49  39.195000  -94.681940  Zone 3  
2  1/3/09 12:32  46.188060 -123.830000  Zone 3  
3  1/3/09 14:22 -36.133333  144.750000  Zone 2  
4  1/4/09 12:45  33.520560  -86.802500  Zone 3  

You missed that both conditions are checking for Latitude and you should check out .loc so you learn how to change values in parts of the dataframe the right way. 您错过了两个条件都在检查Latitude ,您应该检查.loc以便您学习如何以正确的方式更改数据框的部分值。

import pandas as pd
df = pd.read_csv('http://www.sharecsv.com/dl/9096d32f98aa0ac671a1cca16fa43be8/SalesJan2009.csv')
df['Zone'] = ''
zone1 = (df['Latitude'] > 0) & (df['Longitude'] > 0)
zone2 = (df['Latitude'] < 0) & (df['Longitude'] > 0)
zone3 = (df['Latitude'] > 0) & (df['Longitude'] < 0)
zone4 = (df['Latitude'] < 0) & (df['Longitude'] < 0)
df.loc[zone1, 'Zone'] = 1
df.loc[zone2, 'Zone'] = 2
df.loc[zone3, 'Zone'] = 3
df.loc[zone4, 'Zone'] = 4
df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM