简体   繁体   中英

Extract integers and store them as new variable from a list of tuple

In a data frame column, I have list of tuples containing int, str, float. My objective is to extract the numeric value and store it in new column. If there are two numeric value in the list of tuple, then two variables should be created for the two extracted values.

Input data -

List_Tuple 
[('watch','price','is','$','100')]
[('there', 'was', '2','apple','and','2','mango')]
[('2','cat'),('3','mouse')] 

I am not sure whether it can be done, not able to think on the next step. Please guide and advise.

Expected Output -

Var1 Var2
100  
2    2
2    3
final = []
for tup in my_tuple:
    for item in tup:
        if item.isdigit():
            final.append(item)

or as a list comprehension:

[item for item in tup for tup in my_list if item.isdigit()]

if you want to check for floats as well use isinstance(item, (int, float)) eg:

[item for item in tup for tup in my_list if isinstance(item, (int, float))]

edit: I believe this gets you the functionality you want?

df = pd.DataFrame([[[('watch','price','is','$','100')]],
                  [[('there', 'was', '2','apple','and','2','mango')]],
                  [[('2','cat'),('3','mouse')]]])

df.columns = ['x1']

def tuple_join(row):
    tup = row[0]
    tup_int = [item for item in tup if item.isdigit()] 
    return (tup_int) 

test = lambda x: tuple_join(x) 
df['a1'] = pd.DataFrame(df.x1.apply(test))

Let us use the following test data:

List_Tuple = [
    [('watch','price','is','$','100')],
    [('there', 'were', '2','apples','and','2','mangos')],
    [('2','cats'),('3','mice')],
]

Note that some of your lists contains one tuple, and some contain two tuples. In order to search for the numeric values, it would help to merge them together. chain.from_iterable from the `itertools' library is useful for this purpose:

Consider the following code:

for row in List_Tuple: 
    print(*itts.chain.from_iterable(row))

The above code prints as follows:

watch price is $ 100
there were 2 apples and 2 mangos
2 cats 3 mice

All that remains is to extract the numbers

import string
import re # regular expressions
def extract_numbers(in_data):
    out_data = list()
    for row in in_data:
        merged_row = itts.chain.from_iterable(row)
        merged_row = ''.join(merged_row)
        print(merged_row)
        match = re.search("\D*(\d+)\D*(\d*)", merged_row)
        groups = match.groups() if match != None else None
        out_data.append(groups)
    return out_data

print('\n'.join((str(x) for x in extract_numbers(List_Tuple))))

The last print statement displays:

('100', '')
('2', '2')
('2', '3')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM