简体   繁体   English

嵌套循环和DataFrame-Python和Pandas

[英]Nested Loops and DataFrame - Python and Pandas

I am having trouble writing a loop that returns what I need. 我在编写返回我需要的循环时遇到麻烦。 I have two CSV files. 我有两个CSV文件。 For the values in a column in CSV 1, I need to find if there are matching values in CSV 2 and if there are matching values, return a dataframe for the row of the matching values. 对于CSV 1中一列中的值,我需要查找CSV 2中是否存在匹配值,并且是否存在匹配值,请为匹配值的行返回一个数据框。 When I try to create a loop, I cannot get the right values in the loop. 尝试创建循环时,无法在循环中获取正确的值。 For example: 例如:

import pandas as pd

csv2 = pd.read_csv('/users/jamesh/documents/asiopods/asicrawlconcat.csv', header = 1)
csv1 = pd.read_csv('/users/jamesh/documents/asiopods/asiconcat.csv', header = 0)
h1s = csv1['Recommended_H1']


h1 = h1s
h1[0:3] #test
subject = csv2['H1_1']

for x in h1:
    for y in subject:
        if x == y:
            print y

The code above returns the values I need, but in string form. 上面的代码以字符串形式返回我需要的值。 I need to return the dataframe for the values of y, from CSV2 我需要从CSV2返回y值的数据框

Any help or direction is greatly appreciated! 任何帮助或指示将不胜感激!

Edit - with some offline help, I have been able get the correct information from the loop. 编辑-在一些脱机帮助下,我已经能够从循环中获取正确的信息。 However, I still can't figure out how to get the data into a pandas.dataframe. 但是,我仍然不知道如何将数据放入pandas.dataframe。 Instead the data is returned in a vertical manner. 而是以垂直方式返回数据。 Here is the new loop: 这是新的循环:

def foogaiz():
    for k1, v1 in h1.iteritems():
        for k2, v2 in subject.iteritems():
            if v1 == v2:
                data = csv2.irow(k2)
                return data

It's a little unclear if the values you're matching on ("Recommend_H1" in your example) are unique and only appear once in asiconcat.csv. 尚不清楚您所匹配的值(在示例中为“ Recommend_H1”)是否唯一,并且仅在asiconcat.csv中出现一次。 If so, then I recommend naming the two columns that have matching values the same ('H1_1' in my example syntax below) and doing a df.merge() 如果是这样,那么我建议将匹配值相同的两列命名为相同的值(在下面的示例语法中为“ H1_1”),然后执行df.merge()

matched_df = df.merge(crawldf,on="H1_1",how="left")

The left join option is in order to keep rows that don't have matches on crawldf. 左联接选项是为了保留在crawldf上不匹配的行。

You can read the documentation for merge here: 您可以在此处阅读合并文档:

http://pandas.pydata.org/pandas-docs/stable/merging.html http://pandas.pydata.org/pandas-docs/stable/merging.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM