Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Match and extract if found
#11
44.44.44.44.11 is not a Valid IP-Address. A valid IPv4Address consists of 4 blocks with numbers from 0 to 255.
The problem is, that you want to recognize IPv6Addresses to filter them out and accepting IPv4Addresses and treating invalid data as IPv4-Adrdresses?

from contextlib import suppress
from ipaddress import IPv6Address, ip_address


def make_ipv4_set(file_like):
    results = set()

    for line in map(str.strip, file_like):

        addr = None
        with suppress(ValueError):
            addr = ip_address(line)

        if isinstance(addr, IPv6Address):
            continue

        results.add(line)

    return results


with (
    open("ip1_file.txt") as ip1_file,
    open("ip2_file.txt") as ip2_file,
):
    ip1_set = make_ipv4_set(ip1_file)
    ip2_set = make_ipv4_set(ip2_file)


results = ip1_set - ip2_set

for ip in results:
    print(ip)
Output:
44.44.44.44.11 192.168.0.6 192.168.0.4 192.168.0.1 192.168.0.2
Maybe I should take money for it....
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#12
(Sep-09-2022, 09:46 AM)DeaD_EyE Wrote: 44.44.44.44.11 is not a Valid IP-Address. A valid IPv4Address consists of 4 blocks with numbers from 0 to 255.
The problem is, that you want to recognize IPv6Addresses to filter them out and accepting IPv4Addresses and treating invalid data as IPv4-Adrdresses?

from contextlib import suppress
from ipaddress import IPv6Address, ip_address


def make_ipv4_set(file_like):
    results = set()

    for line in map(str.strip, file_like):

        addr = None
        with suppress(ValueError):
            addr = ip_address(line)

        if isinstance(addr, IPv6Address):
            continue

        results.add(line)

    return results


with (
    open("ip1_file.txt") as ip1_file,
    open("ip2_file.txt") as ip2_file,
):
    ip1_set = make_ipv4_set(ip1_file)
    ip2_set = make_ipv4_set(ip2_file)


results = ip1_set - ip2_set

for ip in results:
    print(ip)
Output:
44.44.44.44.11 192.168.0.6 192.168.0.4 192.168.0.1 192.168.0.2
Maybe I should take money for it....

For some reason it doesn't work so I have send you the files which include File1 and File2.
The IP address present in File1 are still being displayed. File2 should compare the IP list and display which isn't in File1.
Reply
#13
results = ip1_set - ip2_set
Only a small change.
results = ip2_set - ip1_set
Look here what sets do: https://realpython.com/python-sets/

After the change, I get only 4 IP-Addresses as result with your data.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#14
(Sep-12-2022, 06:37 PM)DeaD_EyE Wrote:
results = ip1_set - ip2_set
Only a small change.
results = ip2_set - ip1_set
Look here what sets do: https://realpython.com/python-sets/

After the change, I get only 4 IP-Addresses as result with your data.

The file I tested has 100k lines in file1 and 50k in file2 and lot of new IP but the result is I am getting is no output
Reply
#15
gives me no output :(
Reply
#16
(Sep-13-2022, 06:51 AM)Calli Wrote: gives me no output :(
I guess it is because you don't give DeaD_EyE enough money for the job!
Reply
#17
(Sep-13-2022, 06:54 AM)Gribouillis Wrote:
(Sep-13-2022, 06:51 AM)Calli Wrote: gives me no output :(
I guess it is because you don't give DeaD_EyE enough money for the job!

Heart yes maybe but I was wrong now it's working fine Thank you so much DeaD_EyE and please add your btc address in your signature so that I can donate some funds over to you
Reply
#18
(Sep-13-2022, 06:56 AM)Calli Wrote: Heart yes maybe but I was wrong now it's working fine Thank you so much DeaD_EyE and please add your btc address in your signature so that I can donate some funds over to you

I don't have a BTC-Address, the support is for free. So perhaps another one has the same question and this topic may help him/her.
Comparing, sorting, removing and adding IP-Addresses in lists (text-files) as programming task is very common and often required by server admins.

In addition, the people have a real-world example with sets. The set theory is very useful (not in School, but later).
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#19
Just for getting the bitcoins, I made 2 sets of random ip4 addresses.

To compare them is easy. Any other junk in your text files that is not 15 characters long can easily be excluded.

import random

def makeIP():
    mylist = []
    for j in range(4):
        # the bigger the number X in random.randint(1, X), the less chance of overlap
        num = random.randint(1, 10)
        mylist.append(str(num))        
        mystring = '.'.join(mylist)
        mystring = mystring + '\n'
    return mystring

path2txt = '/home/pedro/myPython/random/'

with open(path2txt + 'ip1.txt', 'w') as ip1:
    ipstring = ''
    for i in range(1000):
        ip = makeIP()
        ipstring = ipstring + ip    
    ip1.write(path2txt + ipstring)
        
with open(path2txt + 'ip2.txt', 'w') as ip2:
    ipstring = ''
    for i in range(1000):
        ip = makeIP()
        ipstring = ipstring + ip    
    ip2.write(path2txt + ipstring)        

with open(path2txt + 'ip1.txt') as ip:
    mylist = ip.readlines()    
    myset1 = set(mylist)

with open(path2txt + 'ip2.txt') as ip:
    mylist = ip.readlines()    
    myset2 = set(mylist)

# get the intersection of myset1 and myset2
intersect = myset1 & myset2
difference1_2 = myset1 - myset2
difference2_1 = myset2 - myset1
Reply
#20
    mylist = ip.readlines()    
    myset1 = set(mylist)
Could be refactored to:

    myset1 = set(ip)
An open file-objects has the __iter__ method, which yields line by line, but the ending is not stripped away.
Same with the readlines method of a file-object.

Code with Paths should use pathlib.Path
path2txt = '/home/pedro/myPython/random/'
 
with open(path2txt + 'ip1.txt', 'w') as ip1:
With Path:
from pathlib import Path


path2txt = Path('/home/pedro/myPython/random/')
 
#with open(path2txt + 'ip1.txt', 'w') as ip1:

# then 3 possible ways to use it
with path2txt.joinpath('ip1.txt').open("w") as ip1:
    ...

with (path2txt / 'ip1.txt').open("w") as ip1:
    ...

text_file = path2txt / 'ip1.txt'
with text_file.open("w") as ip1:
    ...
https://realpython.com/python-pathlib/

The same function has an additional issue.

with open(path2txt + 'ip2.txt', 'w') as ip2:
    ipstring = ''
    for i in range(1000):
        ip = makeIP()
        ipstring = ipstring + ip    # <- here the str gets longer and longer and longer.....
    ip2.write(path2txt + ipstring)
This is very memory efficient because a str is immutable.
What really happens behind the scene, is the creation of a new str, where all substrings can fit.
Once ipstring + ip is evaluated, the old ipstring is still in memory and the resulting new str is in memory.
Then the new str is assigned to ipstring.
The old str, where ipstring pointed before, is ready for garbage collection.

Use the str.join method to reduce the memory footprint.
ips = []
for i in range(1000):
    ips.append(makeIP())
ipstring = "\n".join(ips)
To save even more memory:
for i in range(1000):
    ip1.write(makeIP())
The make_ip function could be simplified:
def make_ip():
    return ".".join(str(random.randint(0, 255)) for _ in range(4)) + "\n"
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Using locationtagger to extract locations found in a specific country/region lord_of_cinder 1 1,363 Oct-04-2022, 12:46 AM
Last Post: Larz60+
  If match not found print last line tester_V 2 2,967 Apr-26-2021, 05:18 AM
Last Post: tester_V
  How can I found how many numbers are there in a Collatz Sequence that I found? cananb 2 2,627 Nov-23-2020, 05:15 PM
Last Post: cananb

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020