简体   繁体   English

Python pandas从其他列返回值

[英]Python pandas return value from other column

I have a file "specieslist.txt" which contain the following information: 我有一个文件“specieslist.txt”,其中包含以下信息:

Bacillus,genus
Borrelia,genus
Burkholderia,genus
Campylobacter,genus

Now, I want python to look for a variable in the first column (in this example "Campylobacter") and return the value of the second ("genus"). 现在,我希望python在第一列中查找变量(在此示例中为“Campylobacter”)并返回第二列(“genus”)的值。 I wrote the following code 我写了以下代码

import csv
import pandas as pd

species_import = 'Campylobacter'
df = pd.read_csv('specieslist.txt', header=None, names = ['species', 'level'] )
input = df.loc[df['species'] == species_import]
print (input['level'])

However, my code return too much, while I am only want "genus" 但是,我的代码返回太多,而我只想要“属”

3    genus
Name: level, dtype: object

You can select first value of Series by iat : 您可以通过iat选择Series的第一个值:

species_import = 'Campylobacter'
out = df.loc[df['species'] == species_import, 'level'].iat[0]
#alternative
#out = df.loc[df['species'] == species_import, 'level'].values[0]
print (out)
genus

Better solution working if no value matched and empty Series is returned - it return no match : 如果没有值匹配且返回empty Series则更好的解决方案工作 - 它返回no match

@jpp comment @jpp评论
This solution is better only when you have a large series and the matched value is expected to be near the top 只有当您拥有一个大型系列且匹配值预计接近顶部时,此解决方案才会更好

species_import = 'Campylobacter'
out = next(iter(df.loc[df['species'] == species_import, 'level']), 'no match')
print (out)
genus

EDIT: 编辑:

Idea from comments, thanks @jpp: 来自评论的想法,谢谢@jpp:

def get_first_val(val):
    try:
        return df.loc[df['species'] == val, 'level'].iat[0]
    except IndexError:
        return 'no match'

print (get_first_val(species_import))
genus

print (get_first_val('aaa'))
no match

EDIT: 编辑:

df = pd.DataFrame({'species':['a'] * 10000 + ['b'], 'level':np.arange(10001)})

def get_first_val(val):
    try:
        return df.loc[df['species'] == val, 'level'].iat[0]
    except IndexError:
        return 'no match'


In [232]: %timeit next(iter(df.loc[df['species'] == 'a', 'level']), 'no match')
1.3 ms ± 33.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [233]: %timeit (get_first_val('a'))
1.1 ms ± 21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



In [235]: %timeit (get_first_val('b'))
1.48 ms ± 206 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [236]: %timeit next(iter(df.loc[df['species'] == 'b', 'level']), 'no match')
1.24 ms ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Performance of various methods, to demonstrate when it is useful to use next(...) . 表现各种方法,以证明next(...)使用时有用next(...)

n = 10**6
df = pd.DataFrame({'species': ['b']+['a']*n, 'level': np.arange(n+1)})

def get_first_val(val):
    try:
        return df.loc[df['species'] == val, 'level'].iat[0]
    except IndexError:
        return 'no match'

%timeit next(iter(df.loc[df['species'] == 'b', 'level']), 'no match')     # 123 ms per loop
%timeit get_first_val('b')                                                # 125 ms per loop
%timeit next(idx for idx, val in enumerate(df['species']) if val == 'b')  # 20.3 µs per loop

get

With pandas.Series.get , you can return either a scalar value if the 'species' is unique or a pandas.Series if not unique. 使用pandas.Series.get ,如果'species'是唯一的,则可以返回标量值,或者如果不是唯一的,则返回pandas.Series

f = df.set_index('species').level.get

f('Campylobacter')

'genus'

If not in the data, you can provide a default 如果不在数据中,则可以提供默认值

f('X', 'Not In Data')

'Not In Data'

We could also use dict.get and only return scalars. 我们也可以使用dict.get并只返回标量。 If not unique, this will return the last one. 如果不是唯一的,这将返回最后一个。

f = dict(zip(df.species, df.level)).get

If you want to return the first one, you can do that a few ways 如果你想返回第一个,你可以通过几种方式做到这一点

f = dict(zip(df.species[::-1], df.level[::-1])).get 

Or 要么

f = df.drop_duplicates('species').pipe(
    lambda d: dict(zip(d.species, d.level)).get
)
# Change the last line of your code to 
print(input['level'].values) 
# For Explanation refer below code

import csv
import pandas as pd

species_import = 'Campylobacter'
df = pd.read_csv('specieslist.txt', header=None, names = ['species', 'level'] )

input = df['species'] == species_import # return a pandas dataFrame

print(type(df[input])) # return a Pandas DataFrame

print(type(df[input]['level'])) # return a Pandas Series 

# To obtain the value from this Series.
print(df[input]['level'].values)  # return 'genus'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Pandas 中的索引列从“其他”行返回值 - How to return value from "other" row using index column in Pandas Python值计数并返回pandas中的其他列 - Python value counts and return the other columns in pandas 如果其他列值相等,则熊猫返回列值 - Pandas return column value if other column value is equal 使用 Python 和 pandas 在列中搜索值并返回同一行但不同列的值 - Search for a value in a column and return value from the same row but different column using Python and pandas 使用 Python Pandas 将列值与不同列进行比较,并从同一行但不同列返回值 - Compare a Column value to different columns and return a value from same row but different column using Python Pandas Python - Pandas - 根据其他列的值替换列中的字符串 - 处理子字符串 - Python - Pandas - Replace a string from a column based on the value from other column - Dealing with substrings Python - Pandas - 根据其他列的值替换列中的字符串 - Python - Pandas - Replace a string from a column based on the value from other column 大熊猫从其他2列中删除列值 - pandas delete column value from other 2 columns Pandas:如何使用其他 dataframe 的列值从 dataframe 返回具有相同行值的行? - Pandas: How to return the row from dataframe having same row values by using column value of other dataframe? 从 python pandas 中的其他列创建列 - create column from other columns in python pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM